dealing with empty field names in query

dealing with empty field names in query

am 06.02.2009 19:27:02 von John ORourke

Hi mod_perl list,

We're using more and more javascript to do clever things with forms, and
I think we broke the Apache2::Request parser, but wanted to check before
reporting it as a bug. (and tell me if this should go to the apreq list)

With the following request body:

i1=drnk4&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aqu an
tity=1&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aid=d e9a792da0f5127d72d7c6a5f
6b2d4c5&i2=clth12&basket%3A_new_7acf9602cd6ab0ee86f77efeaaff efff%3Aquantity=1&basket
%3A_new_7acf9602cd6ab0ee86f77efeaaffefff%3Aid=7acf9602cd6ab0 ee86f77efeaaffefff&i3=&=
&=&i4=&=&=&i5=&=&=&i6=&=&=&action=insert&x=46&y=17

When I create a new Apache2::Request object and loop through the
parameters, I get this: (output from Data::Dumper of a hash of the params)

'basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:id' =>
'7acf9602cd6ab0ee86f77efeaaff
efff',
'basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:quantity' => '1',
'basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:id' =>
'de9a792da0f5127d72d7c6a5f6b2
d4c5',
'basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:quantity' => '1',
'i1' => 'drnk4',
'i2' => 'clth12',
'i3' => ''

So it stops parsing when it gets an '=' straight after an ampersand.

I looked up the spec and it doesn't seem to explicitly say, so I don't
think we should just stop parsing.

Spec:

http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1


thanks
John
--

Re: dealing with empty field names in query

am 06.02.2009 22:58:30 von Phil Carmody

--- On Fri, 2/6/09, John ORourke wrote:
> We're using more and more javascript to do clever
> things with forms,

Lots of people have said that. Probably a majority were wrong.

> and I think we broke the Apache2::Request
> parser, but wanted to check before reporting it as a bug.
> (and tell me if this should go to the apreq list)
>
> With the following request body:
>
> i1=drnk4&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aqu an
> tity=1&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aid=d e9a792da0f5127d72d7c6a5f
> 6b2d4c5&i2=clth12&basket%3A_new_7acf9602cd6ab0ee86f77efeaaff efff%3Aquantity=1&basket
> %3A_new_7acf9602cd6ab0ee86f77efeaaffefff%3Aid=7acf9602cd6ab0 ee86f77efeaaffefff&i3=&=
> &=&i4=&=&=&i5=&=&=&i6=&=&=&action=insert&x=46&y=17
>
> When I create a new Apache2::Request object and loop
> through the parameters, I get this: (output from
> Data::Dumper of a hash of the params)
>
> 'basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:id'
> => '7acf9602cd6ab0ee86f77efeaaff
> efff',
>
> 'basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:quantity'
> => '1',
> 'basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:id'
> => 'de9a792da0f5127d72d7c6a5f6b2
> d4c5',
>
> 'basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:quantity'
> => '1',
> 'i1' => 'drnk4',
> 'i2' => 'clth12',
> 'i3' => ''
>
> So it stops parsing when it gets an '=' straight
> after an ampersand.
>
> I looked up the spec and it doesn't seem to explicitly
> say, so I don't think we should just stop parsing.
>
> Spec:
>
> http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1

In those name/value pairs, according to HTML 4 at least, the names must begin with a letter [A-Za-z]. The empty string does not do so. Garbage in, garbage out.

Phil
Invalid.

Re: dealing with empty field names in query

am 07.02.2009 00:00:43 von aw

Phil Carmody wrote:
> --- On Fri, 2/6/09, John ORourke wrote:
>> We're using more and more javascript to do clever
>> things with forms,
>
> Lots of people have said that. Probably a majority were wrong.
>
>> and I think we broke the Apache2::Request
>> parser, but wanted to check before reporting it as a bug.
>> (and tell me if this should go to the apreq list)
>>
>> With the following request body:
>>
>> i1=drnk4&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aqu an
>> tity=1&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aid=d e9a792da0f5127d72d7c6a5f
>> 6b2d4c5&i2=clth12&basket%3A_new_7acf9602cd6ab0ee86f77efeaaff efff%3Aquantity=1&basket
>> %3A_new_7acf9602cd6ab0ee86f77efeaaffefff%3Aid=7acf9602cd6ab0 ee86f77efeaaffefff&i3=&=
>> &=&i4=&=&=&i5=&=&=&i6=&=&=&action=insert&x=46&y=17
>>
>> When I create a new Apache2::Request object and loop
>> through the parameters, I get this: (output from
>> Data::Dumper of a hash of the params)
>>
>> 'basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:id'
>> => '7acf9602cd6ab0ee86f77efeaaff
>> efff',
>>
>> 'basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:quantity'
>> => '1',
>> 'basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:id'
>> => 'de9a792da0f5127d72d7c6a5f6b2
>> d4c5',
>>
>> 'basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:quantity'
>> => '1',
>> 'i1' => 'drnk4',
>> 'i2' => 'clth12',
>> 'i3' => ''
>>
>> So it stops parsing when it gets an '=' straight
>> after an ampersand.
>>
>> I looked up the spec and it doesn't seem to explicitly
>> say, so I don't think we should just stop parsing.
>>
>> Spec:
>>
>> http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1
>
> In those name/value pairs, according to HTML 4 at least, the names must begin with a letter [A-Za-z]. The empty string does not do so. Garbage in, garbage out.
>
+1
+ :
Above the OP is talking about a request "body". Are we sure that this
is really a request body, and not a query-string ?
What does the

tag really look like ? (enctype)
Just thinking that if this is a query-string, is it not just being cut
off after a certain size ?
It would not be possible to submit this data as multipart/form-data, for
a similar reason to what Phil says.

Re: dealing with empty field names in query

am 07.02.2009 03:18:48 von John ORourke

Andr=E9 Warnier wrote:
>
>> In those name/value pairs, according to HTML 4 at least, the names=20
>> must begin with a letter [A-Za-z]. The empty string does not do so.=20
>> Garbage in, garbage out.
>>
> +1
> + :
> Above the OP is talking about a request "body". Are we sure that this =

> is really a request body, and not a query-string ?
> What does the tag really look like ? (enctype)
> Just thinking that if this is a query-string, is it not just being cut =

> off after a certain size ?
> It would not be possible to submit this data as multipart/form-data,=20
> for a similar reason to what Phil says.
>

It's regular form data using the post method, so no length issue and=20
normal URI parameter rules apply. I suspect Phil is correct but I can't =

find any mention of this in the HTML 4 spec, which is what prompted my=20
question. Phil, can you point me to the part of the spec which=20
specifies that a field name must begin with an ASCII letter?

cheers
John

Re: dealing with empty field names in query

am 07.02.2009 09:14:10 von Phil Carmody

--- On Sat, 2/7/09, John ORourke wrote:
> Phil, can you point me to the part of the spec which
> specifies that a field name must begin with an ASCII letter?

http://www.w3.org/TR/html401/types.html#type-cdata

Phil

Re: dealing with empty field names in query

am 07.02.2009 13:15:47 von Clinton Gormley

> With the following request body:
>
> i1=drnk4&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aqu an
> tity=1&basket%3A_new_de9a792da0f5127d72d7c6a5f6b2d4c5%3Aid=d e9a792da0f5127d72d7c6a5f
> 6b2d4c5&i2=clth12&basket%3A_new_7acf9602cd6ab0ee86f77efeaaff efff%3Aquantity=1&basket
> %3A_new_7acf9602cd6ab0ee86f77efeaaffefff%3Aid=7acf9602cd6ab0 ee86f77efeaaffefff&i3=&=
> &=&i4=&=&=&i5=&=&=&i6=&=&=&action=insert&x=46&y=17

When I pass the above as a query string to my site, Apache2::Request
(from libapreq2-2.08) parses it as follows:

------------------------------------------------------------ ----------
$APR_Request_Param_Table1 = bless( {
"=" => '',
"=" => '',
"=" => '',
"=" => '',
"=" => '',
"=" => '',
"=" => '',
"=" => '',
action => 'insert',
"basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:id"
=> '7acf9602cd6ab0ee86f77efeaaffefff',
"basket:_new_7acf9602cd6ab0ee86f77efeaaffefff:quantity"
=> 1,
"basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:id"
=> 'de9a792da0f5127d72d7c6a5f6b2d4c5',
"basket:_new_de9a792da0f5127d72d7c6a5f6b2d4c5:quantity"
=> 1,
i1 => 'drnk4',
i2 => 'clth12',
i3 => '',
i4 => '',
i5 => '',
i6 => '',
x => 46,
y => 17
}, 'APR::Request::Param::Table' );
--------------------------------------------------------- -------------


Are you using a different version? Or is it the fact that you're
POSTing it?

Clint

Re: dealing with empty field names in query

am 07.02.2009 14:06:16 von aw

Clinton Gormley wrote:
>
> Are you using a different version? Or is it the fact that you're
> POSTing it?
>
Sorry for the lecture, but I see this so often that it seems it deserves
repeating :

To send the content of a to a webserver, you can use either a
POST or a GET method.
You should use a GET, if the result of sending this to the server, is
not going to modify anything on the server, and if re-sending the same
request several times would always give the same result.
In technical jargon, that is called "idempotent".

You should use POST if it is not the case, in other words if what you
are sending is going to modify something, and multiple identical
requests would be "not idempotent".

Neither of the above says how you are passing the data to the server
however. This is something else entirely.

Separately from the above, and usable with either one, is the question
of how you are passing the data of your request to the server.
This you can also do in two different ways :
- encoded as "application/x-www-form-urlencoded"
- or encoded as "multipart/form-data"

"application/x-www-form-urlencoded" is the default, and it means that
you are passing the form data appended at the end of the URL, preceded
by a "?" sign, as one long string of the form
"name1=value1&name2=value2..." etc..
usually known as "the query string".
That is easy to do, but has the inconvenient that the server does not
really know in which character set these things are. This can play
havoc with internationally-minded applications.
It can also have the result that the request may be truncated after a
certain maximum length, by some intervening actor.

"multipart/form-data" is more complicated and harder to do, and is
described here :
http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4
but it has the advantage that each of the "name=value" pairs can be as
long as you want, and that the type of data and encoding of each is clear.

In neither of the above though, is it allowed in the specs to send a
"name=value" pair where there is no name. And if name there is, the
specs do define what is allowed it in, and "" is not among these.

Now which combination of the above some clever javascript function may
decide to use when sending the form content to the server, is another
matter.
But as Phil rightly said, garbage in, garbage out.
Whether the server software can deal or not with some forms of invalid
data, is rather outside of the question. It is certainly not obliged to.

And the request data of which it is originally the question here is
certainly, without a doubt, invalid.

In my opinion thus, the OP should first take whatever measure is
appropriate to ensure that his application sends only valid data, and
then come back if there is still a problem.

Re: dealing with empty field names in query

am 07.02.2009 20:46:42 von John ORourke

André Warnier wrote:
> "application/x-www-form-urlencoded" is the default, and it means that=20
> you are passing the form data appended at the end of the URL, preceded =

> by a "?" sign, as one long string of the form=20
> "name1=3Dvalue1&name2=3Dvalue2..." etc..
> usually known as "the query string".
> That is easy to do, but has the inconvenient that the server does not=20
> really know in which character set these things are. This can play=20
> havoc with internationally-minded applications.
> It can also have the result that the request may be truncated after a=20
> certain maximum length, by some intervening actor.

Sorry, I have to throw a little exception there.... the char set for=20
URL-encoded get/post data is defined in=20
http://www.w3.org/TR/html401/interact/forms.html#h-17.3 as the value of=20
the form element's accept-charset attribute, which defaults to=20
"UNKNOWN", and "User agents may interpret this value as the character=20
encoding that was used to transmit the document containing this FORM=20
element.". We rely on this in a system used on hundreds of sites in=20
various countries for several years, and we have found no exceptions.

Other that that, I hope our less experienced readers take note of your=20
excellent advice.

cheers
John

Re: dealing with empty field names in query

am 13.02.2009 21:30:20 von jonathan vanasco

On Feb 6, 2009, at 4:58 PM, Phil Carmody wrote:

> In those name/value pairs, according to HTML 4 at least, the names
> must begin with a letter [A-Za-z]. The empty string does not do so.
> Garbage in, garbage out.



Part of me agrees with that philosophy.

Another part of me is more practical.

We had to stop using libapreq2 for cookies, because we found out that
wordpress (being a shoddy piece of software) was generating invalid
cookies at times. when apreq encountered it, it segfaulted.

so while the engineering part of me is okay with garbage in / garbage
out, the management side of me says sometimes you have to expect bad
data and try to make the best of it - otherwise you lose customers and
revenue.

Re: dealing with empty field names in query

am 13.02.2009 21:38:24 von aw

Jonathan Vanasco wrote:
>
> On Feb 6, 2009, at 4:58 PM, Phil Carmody wrote:
>
>> In those name/value pairs, according to HTML 4 at least, the names
>> must begin with a letter [A-Za-z]. The empty string does not do so.
>> Garbage in, garbage out.
>
>
>
> Part of me agrees with that philosophy.
>
> Another part of me is more practical.
>
> We had to stop using libapreq2 for cookies, because we found out that
> wordpress (being a shoddy piece of software) was generating invalid
> cookies at times. when apreq encountered it, it segfaulted.
>
> so while the engineering part of me is okay with garbage in / garbage
> out, the management side of me says sometimes you have to expect bad
> data and try to make the best of it - otherwise you lose customers and
> revenue.
>
The management part of me says that if you sell shoddy merchandise to
people, they are going to come back and hit you with it.
Presumably, if you get such kind of posted data from a form, it is
because you sent a shoddy form to the browser, which can submit such
shoddy data. Or because you have some shoddy javascript in the form,
which sends shoddy data to your server.
So we're still at the garbage level, but the other way around : garbage
out, gargabe in.
;-)

Re: dealing with empty field names in query

am 13.02.2009 22:03:26 von jonathan vanasco

On Feb 13, 2009, at 3:38 PM, Andr=E9 Warnier wrote:

> The management part of me says that if you sell shoddy merchandise to
> people, they are going to come back and hit you with it.
> Presumably, if you get such kind of posted data from a form, it is
> because you sent a shoddy form to the browser, which can submit such
> shoddy data. Or because you have some shoddy javascript in the form,
> which sends shoddy data to your server.
> So we're still at the garbage level, but the other way around : =20
> garbage
> out, gargabe in.
> ;-)

That's assuming that you're responsible.

Today many people use misc javascript libraries; and there are js DMZ =20=

servers that serve off cached versions so people don't have to =20
reload. A simple typo could render your application broken.=

Re: dealing with empty field names in query

am 13.02.2009 22:35:08 von David Nicol

On Fri, Feb 13, 2009 at 3:03 PM, Jonathan Vanasco wrote:
> A simple typo could render your application broken.

Or a hostile competitor.

Re: dealing with empty field names in query

am 13.02.2009 23:11:22 von Joe Schaefer

----- Original Message ----

> From: Jonathan Vanasco
> To: modperl
> Sent: Friday, February 13, 2009 3:30:20 PM
> Subject: Re: dealing with empty field names in query
>
>
> On Feb 6, 2009, at 4:58 PM, Phil Carmody wrote:
>
> > In those name/value pairs, according to HTML 4 at least, the names must begin
> with a letter [A-Za-z]. The empty string does not do so. Garbage in, garbage
> out.
>
>
>
> Part of me agrees with that philosophy.
>
> Another part of me is more practical.
>
> We had to stop using libapreq2 for cookies, because we found out that wordpress
> (being a shoddy piece of software) was generating invalid cookies at times.
> when apreq encountered it, it segfaulted.

What version of apreq was this? And did you report it to the apreq-dev@ mailing list?

> so while the engineering part of me is okay with garbage in / garbage out, the
> management side of me says sometimes you have to expect bad data and try to make
> the best of it - otherwise you lose customers and revenue.

Re: dealing with empty field names in query

am 24.02.2009 08:47:52 von jonathan vanasco

On Feb 13, 2009, at 5:11 PM, Joe Schaefer wrote:
>> We had to stop using libapreq2 for cookies, because we found out
>> that wordpress
>> (being a shoddy piece of software) was generating invalid cookies
>> at times.
>> when apreq encountered it, it segfaulted.
>
> What version of apreq was this? And did you report it to the apreq-
> dev@ mailing list?

2.07

reported to apreq-dev in 2006 : http://marc.info/?l=apreq-
dev&m=113996436206606&w=2

it's an edge case to cause it -- you have to somehow write a bad
cookie, which most libraries fix for automatically. wordpress did
that often back then though.