Re: Utf-8 and Content-Length header

Re: Utf-8 and Content-Length header

am 08.06.2006 16:31:08 von pav

Gisle Aas pí¹e v =E8t 08. 06. 2006 v 07:22 -0700:
> Pav Lucistnik writes:
>=20
> > A general problem exists in libwww when POSTing data to HTTP servers,
> > Content-Length headers and syswrite() calls use length() function over
> > payload to determine it's size. This fails on Utf-8 encoded data,5D
> > because length() function returns the count of _characters_, not count
> > of bytes. It leads to a message truncated.
>=20
> If the data you put into the request content is UTF-8 encoded data
> then there is no problem. Encoded text is already bytes and that's
> what the content and headers of HTTP::Message objects are supposed to
> be. Use the Encode module to convert Unicode strings that you add to
> the request.

That's what I was doing at first. But I hit some problems, so I tried
my luck with passing utf8-strings into libwww. I tested my patches,
they do not affect operation on normal-strings in any way.


The real problem. For some odd reason, $request_buf is coming with utf8
flag on, spoiling $content_buf thanks to this line

my $buf =3D $req_buf . $$content_ref;

and the whole thing goes down in flames afterwards.

code:
222 warn "WWWBUFFF content_ref ", utf8::is_utf8($$content_ref), " - ", =
length($$content_ref), "/", bytes::length($$content_ref);
239 warn "WWWBUFFF req_buf ", utf8::is_utf8($req_buf), " - ", length($r=
eq_buf), "/", bytes::length($req_buf);
240 my $buf =3D $req_buf . $$content_ref;
241 warn "WWWBUFFF buf ", utf8::is_utf8($buf), " - ", length($buf), "/"=
, bytes::length($buf);

output:
WWWBUFFF content_ref - 1369/1369 at /usr/local/lib/perl5/site_perl/5.8.8/L=
WP/Protocol/http.pm line 222.
WWWBUFFF req_buf 1 - 328/328 at /usr/local/lib/perl5/site_perl/5.8.8/LWP/Pr=
otocol/http.pm line 239.
WWWBUFFF buf 1 - 1697/1699 at /usr/local/lib/perl5/site_perl/5.8.8/LWP/Prot=
ocol/http.pm line 241.

I'm a bit puzzled as on what's going on here. Any tips? I'm still
looking into this.

--=20
Pav Lucistnik


East or west, ~ is best.