Another utf8 problem

Another utf8 problem

am 10.10.2006 20:13:04 von dean.brockhausen

I have been experiencing problems when I try to put binary data
via LWP. The problem only manifests it self when I am making a
HTTPS request . I have hex coded the binary data for easy viewing.
the buffer that I pass to LWP does not have the utf8 flag on.


The data that I put is

a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3

The data the server gets is :

c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294

The c2..c3..c2......c3 etc makes me think :

Is this my data utf-8 encoded and truncated?

So I take the put data and encode it to utf8
e.g. $data=$encode("utf8",$data);

c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294c290c383c38708c3a229c3aec3a13917c2a3

Sure enough it is.

I have followed the data all the way to the syswrite on line 292 of http.pm
(version # $Id: http.pm,v 1.70 2005/12/08 10:28:01 gisle Exp $)
and the data is not been encoded at this point. but the utf8 flag has changed.


at http.pm line 205 $req_buf has the utf8 flag on.
at http.pm line 233 $$content_ref does not have the utf8 flag.
the cancatenation at lin 234 gives $buf the utf8 flag.

this is the same behavior between http and https requests.
Finaly ethereal shows that the http data has not been encoded.
While I cannot look at the wire in the https case the data at the
other end has been utf8 encoded and truncated.
the truncation seems to be from the issue described in this thread
http://www.mail-archive.com/libwww@perl.org/msg06097.html



if I nix the utf8 flag the binary data does not get encoded and
every thing works.
e.g. (line 233 ...)

if ($req_buf) {
Encode::_utf8_off($req_buf);# DMB
my $buf = $req_buf . $$content_ref;
$wbuf = \$buf;
}

I may be doing something wrong, but I think it is a bug somewhere
under syswrite
to encode the data that was not utf8 when it came from the user.
Any help would be appreciated, though for now I will continue to nix
the utf8 flag.

Dean

Re: Another utf8 problem

am 10.10.2006 23:29:27 von mumia.w.18.spam+nospam

On 10/10/2006 01:13 PM, Dean Brockhausen wrote:
> I have been experiencing problems when I try to put binary data
> via LWP. The problem only manifests it self when I am making a
> HTTPS request . I have hex coded the binary data for easy viewing.
> the buffer that I pass to LWP does not have the utf8 flag on.
>
>
> The data that I put is
>
> a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3
>
> The data the server gets is :
>
> c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294
>
> The c2..c3..c2......c3 etc makes me think :
>
> Is this my data utf-8 encoded and truncated?
>
> So I take the put data and encode it to utf8
> e.g. $data=$encode("utf8",$data);
>
> c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294c290c383c38708c3a229c3aec3a13917c2a3
>
>
> Sure enough it is.
>
> I have followed the data all the way to the syswrite on line 292 of http.pm
> (version # $Id: http.pm,v 1.70 2005/12/08 10:28:01 gisle Exp $)
> and the data is not been encoded at this point. but the utf8 flag has
> changed.
>
>
> at http.pm line 205 $req_buf has the utf8 flag on.
> at http.pm line 233 $$content_ref does not have the utf8 flag.
> the cancatenation at lin 234 gives $buf the utf8 flag.
>
> this is the same behavior between http and https requests.
> Finaly ethereal shows that the http data has not been encoded.
> While I cannot look at the wire in the https case the data at the
> other end has been utf8 encoded and truncated.
> the truncation seems to be from the issue described in this thread
> http://www.mail-archive.com/libwww@perl.org/msg06097.html
>
>
>
> if I nix the utf8 flag the binary data does not get encoded and
> every thing works.
> e.g. (line 233 ...)
>
> if ($req_buf) {
> Encode::_utf8_off($req_buf);# DMB
> my $buf = $req_buf . $$content_ref;
> $wbuf = \$buf;
> }
>
> I may be doing something wrong, but I think it is a bug somewhere
> under syswrite
> to encode the data that was not utf8 when it came from the user.
> Any help would be appreciated, though for now I will continue to nix
> the utf8 flag.
>
> Dean
>

You didn't post a small working example of your code and data, so it's
impossible to test this problem. All I can do is offer general advice
such as to uuencode or base64-encode the data before sending (if you
have control over both scripts).

Perhaps you should create a small script (not your entire program) that
demonstrates the problem.

Re: Another utf8 problem

am 11.10.2006 19:36:30 von dean.brockhausen

Thanks for the reply
Here is the requested script.

require LWP::UserAgent;
use Encode;
$agent=new LWP::UserAgent;

my $key="httpspublic/test";
my $protocol = 'https';
#my $protocol = 'http';
my $url = "$protocol://s3.amazonaws.com:443/$key";
my $content=pack('H*','a4cb83b4ce7681417e1c425d1a30d02690c0d31e 9490c3c708e229eee13917a3');
print unpack('H*',$content),"\n";
my @headers=();;

Encode::_utf8_on($url);
my $request = HTTP::Request->new('PUT', $url, \@headers);
$request->content($content);
my $response= $agent->request($request);


$request = HTTP::Request->new('GET', $url, \@headers);
$response= $agent->request($request);

$content1=$response->content();
print unpack('H*',$content1),"\n";


The problem is that the utf8 flag is true for the url but not the data.
when the 2 are concatenated, the data inherits the flag.



On 10/10/06, Mumia W. wrote:
> On 10/10/2006 01:13 PM, Dean Brockhausen wrote:
> > I have been experiencing problems when I try to put binary data
> > via LWP. The problem only manifests it self when I am making a
> > HTTPS request . I have hex coded the binary data for easy viewing.
> > the buffer that I pass to LWP does not have the utf8 flag on.
> >
> >
> > The data that I put is
> >
> > a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3
> >
> > The data the server gets is :
> >
> > c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294
> >
> > The c2..c3..c2......c3 etc makes me think :
> >
> > Is this my data utf-8 encoded and truncated?
> >
> > So I take the put data and encode it to utf8
> > e.g. $data=$encode("utf8",$data);
> >
> > c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294c290c383c38708c3a229c3aec3a13917c2a3
> >
> >
> > Sure enough it is.
> >
> > I have followed the data all the way to the syswrite on line 292 of http.pm
> > (version # $Id: http.pm,v 1.70 2005/12/08 10:28:01 gisle Exp $)
> > and the data is not been encoded at this point. but the utf8 flag has
> > changed.
> >
> >
> > at http.pm line 205 $req_buf has the utf8 flag on.
> > at http.pm line 233 $$content_ref does not have the utf8 flag.
> > the cancatenation at lin 234 gives $buf the utf8 flag.
> >
> > this is the same behavior between http and https requests.
> > Finaly ethereal shows that the http data has not been encoded.
> > While I cannot look at the wire in the https case the data at the
> > other end has been utf8 encoded and truncated.
> > the truncation seems to be from the issue described in this thread
> > http://www.mail-archive.com/libwww@perl.org/msg06097.html
> >
> >
> >
> > if I nix the utf8 flag the binary data does not get encoded and
> > every thing works.
> > e.g. (line 233 ...)
> >
> > if ($req_buf) {
> > Encode::_utf8_off($req_buf);# DMB
> > my $buf = $req_buf . $$content_ref;
> > $wbuf = \$buf;
> > }
> >
> > I may be doing something wrong, but I think it is a bug somewhere
> > under syswrite
> > to encode the data that was not utf8 when it came from the user.
> > Any help would be appreciated, though for now I will continue to nix
> > the utf8 flag.
> >
> > Dean
> >
>
> You didn't post a small working example of your code and data, so it's
> impossible to test this problem. All I can do is offer general advice
> such as to uuencode or base64-encode the data before sending (if you
> have control over both scripts).
>
> Perhaps you should create a small script (not your entire program) that
> demonstrates the problem.
>
>
>

Re: Another utf8 problem

am 11.10.2006 23:16:53 von mumia.w.18.spam+nospam

On 10/11/2006 12:36 PM, Dean Brockhausen wrote:
> On 10/10/06, Mumia W. wrote:
>> On 10/10/2006 01:13 PM, Dean Brockhausen wrote:
>> > I have been experiencing problems when I try to put binary data
>> > via LWP. The problem only manifests it self when I am making a
>> > HTTPS request . I have hex coded the binary data for easy viewing.
>> > the buffer that I pass to LWP does not have the utf8 flag on.
>> >
>> >
>> > The data that I put is
>> >
>> > a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3
>> >
>> > The data the server gets is :
>> >
>> > c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294
>> >
>> > The c2..c3..c2......c3 etc makes me think :
>> >
>> > Is this my data utf-8 encoded and truncated?
>> >
>> > So I take the put data and encode it to utf8
>> > e.g. $data=$encode("utf8",$data);
>> >
>> >
>> c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294c290c383c38708c3a229c3aec3a13917c2a3
>>
>> >
>> >
>> > Sure enough it is.
>> >
>> > I have followed the data all the way to the syswrite on line 292 of
>> http.pm
>> > (version # $Id: http.pm,v 1.70 2005/12/08 10:28:01 gisle Exp $)
>> > and the data is not been encoded at this point. but the utf8 flag has
>> > changed.
>> >
>> >
>> > at http.pm line 205 $req_buf has the utf8 flag on.
>> > at http.pm line 233 $$content_ref does not have the utf8 flag.
>> > the cancatenation at lin 234 gives $buf the utf8 flag.
>> >
>> > this is the same behavior between http and https requests.
>> > Finaly ethereal shows that the http data has not been encoded.
>> > While I cannot look at the wire in the https case the data at the
>> > other end has been utf8 encoded and truncated.
>> > the truncation seems to be from the issue described in this thread
>> > http://www.mail-archive.com/libwww@perl.org/msg06097.html
>> >
>> >
>> >
>> > if I nix the utf8 flag the binary data does not get encoded and
>> > every thing works.
>> > e.g. (line 233 ...)
>> >
>> > if ($req_buf) {
>> > Encode::_utf8_off($req_buf);# DMB
>> > my $buf = $req_buf . $$content_ref;
>> > $wbuf = \$buf;
>> > }
>> >
>> > I may be doing something wrong, but I think it is a bug somewhere
>> > under syswrite
>> > to encode the data that was not utf8 when it came from the user.
>> > Any help would be appreciated, though for now I will continue to nix
>> > the utf8 flag.
>> >
>> > Dean
>> >
>>
>> You didn't post a small working example of your code and data, so it's
>> impossible to test this problem. All I can do is offer general advice
>> such as to uuencode or base64-encode the data before sending (if you
>> have control over both scripts).
>>
>> Perhaps you should create a small script (not your entire program) that
>> demonstrates the problem.
>>
>>
>>
>
> Thanks for the reply
> Here is the requested script.
>
> require LWP::UserAgent;
> use Encode;
> $agent=new LWP::UserAgent;
>
> my $key="httpspublic/test";
> my $protocol = 'https';
> #my $protocol = 'http';
> my $url = "$protocol://s3.amazonaws.com:443/$key";
> my
> $content=pack('H*','a4cb83b4ce7681417e1c425d1a30d02690c0d31e 9490c3c708e229eee13917a3');

>
> print unpack('H*',$content),"\n";
> my @headers=();;
>
> Encode::_utf8_on($url);
> my $request = HTTP::Request->new('PUT', $url, \@headers);
> $request->content($content);
> my $response= $agent->request($request);
>
>
> $request = HTTP::Request->new('GET', $url, \@headers);
> $response= $agent->request($request);
>
> $content1=$response->content();
> print unpack('H*',$content1),"\n";
>
>
> The problem is that the utf8 flag is true for the url but not the data.
> when the 2 are concatenated, the data inherits the flag.
>
>
>

What is https://s3.amazonaws.com:443/httpspublic/test supposed to do?

When I do a GET request, I get this (hexdump output):

0000000 a4c2 8bc3 83c2 b4c2 8ec3 c276 4181 1c7e
0000010 5d42 301a 90c3 c226 c390 c380 1e93 94c2
0000020

Re: Another utf8 problem

am 12.10.2006 00:03:07 von dean.brockhausen

the two lines should be the same. the first line is the content that was put
the second line is the content that was retrieved. They should match.

if you comment out the Encode::_utf8_off line they will match.

Dean

On 10/11/06, Mumia W. wrote:
> On 10/11/2006 12:36 PM, Dean Brockhausen wrote:
> > On 10/10/06, Mumia W. wrote:
> >> On 10/10/2006 01:13 PM, Dean Brockhausen wrote:
> >> > I have been experiencing problems when I try to put binary data
> >> > via LWP. The problem only manifests it self when I am making a
> >> > HTTPS request . I have hex coded the binary data for easy viewing.
> >> > the buffer that I pass to LWP does not have the utf8 flag on.
> >> >
> >> >
> >> > The data that I put is
> >> >
> >> > a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3
> >> >
> >> > The data the server gets is :
> >> >
> >> > c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294
> >> >
> >> > The c2..c3..c2......c3 etc makes me think :
> >> >
> >> > Is this my data utf-8 encoded and truncated?
> >> >
> >> > So I take the put data and encode it to utf8
> >> > e.g. $data=$encode("utf8",$data);
> >> >
> >> >
> >> c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294c290c383c38708c3a229c3aec3a13917c2a3
> >>
> >> >
> >> >
> >> > Sure enough it is.
> >> >
> >> > I have followed the data all the way to the syswrite on line 292 of
> >> http.pm
> >> > (version # $Id: http.pm,v 1.70 2005/12/08 10:28:01 gisle Exp $)
> >> > and the data is not been encoded at this point. but the utf8 flag has
> >> > changed.
> >> >
> >> >
> >> > at http.pm line 205 $req_buf has the utf8 flag on.
> >> > at http.pm line 233 $$content_ref does not have the utf8 flag.
> >> > the cancatenation at lin 234 gives $buf the utf8 flag.
> >> >
> >> > this is the same behavior between http and https requests.
> >> > Finaly ethereal shows that the http data has not been encoded.
> >> > While I cannot look at the wire in the https case the data at the
> >> > other end has been utf8 encoded and truncated.
> >> > the truncation seems to be from the issue described in this thread
> >> > http://www.mail-archive.com/libwww@perl.org/msg06097.html
> >> >
> >> >
> >> >
> >> > if I nix the utf8 flag the binary data does not get encoded and
> >> > every thing works.
> >> > e.g. (line 233 ...)
> >> >
> >> > if ($req_buf) {
> >> > Encode::_utf8_off($req_buf);# DMB
> >> > my $buf = $req_buf . $$content_ref;
> >> > $wbuf = \$buf;
> >> > }
> >> >
> >> > I may be doing something wrong, but I think it is a bug somewhere
> >> > under syswrite
> >> > to encode the data that was not utf8 when it came from the user.
> >> > Any help would be appreciated, though for now I will continue to nix
> >> > the utf8 flag.
> >> >
> >> > Dean
> >> >
> >>
> >> You didn't post a small working example of your code and data, so it's
> >> impossible to test this problem. All I can do is offer general advice
> >> such as to uuencode or base64-encode the data before sending (if you
> >> have control over both scripts).
> >>
> >> Perhaps you should create a small script (not your entire program) that
> >> demonstrates the problem.
> >>
> >>
> >>
> >
> > Thanks for the reply
> > Here is the requested script.
> >
> > require LWP::UserAgent;
> > use Encode;
> > $agent=new LWP::UserAgent;
> >
> > my $key="httpspublic/test";
> > my $protocol = 'https';
> > #my $protocol = 'http';
> > my $url = "$protocol://s3.amazonaws.com:443/$key";
> > my
> > $content=pack('H*','a4cb83b4ce7681417e1c425d1a30d02690c0d31e 9490c3c708e229eee13917a3');
>
> >
> > print unpack('H*',$content),"\n";
> > my @headers=();;
> >
> > Encode::_utf8_on($url);
> > my $request = HTTP::Request->new('PUT', $url, \@headers);
> > $request->content($content);
> > my $response= $agent->request($request);
> >
> >
> > $request = HTTP::Request->new('GET', $url, \@headers);
> > $response= $agent->request($request);
> >
> > $content1=$response->content();
> > print unpack('H*',$content1),"\n";
> >
> >
> > The problem is that the utf8 flag is true for the url but not the data.
> > when the 2 are concatenated, the data inherits the flag.
> >
> >
> >
>
> What is https://s3.amazonaws.com:443/httpspublic/test supposed to do?
>
> When I do a GET request, I get this (hexdump output):
>
> 0000000 a4c2 8bc3 83c2 b4c2 8ec3 c276 4181 1c7e
> 0000010 5d42 301a 90c3 c226 c390 c380 1e93 94c2
> 0000020
>
>
>
>

Re: Another utf8 problem

am 12.10.2006 00:06:24 von dean.brockhausen

When I run the script this is what I get
C:\btb\btb\debug>perl sample.pl
a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3
c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294

these lines should match, but the second one is a utf8 encoded and
truncated version of the first.

probably getting from a browser is not very useful.

Dean

On 10/11/06, Mumia W. wrote:
> On 10/11/2006 12:36 PM, Dean Brockhausen wrote:
> > On 10/10/06, Mumia W. wrote:
> >> On 10/10/2006 01:13 PM, Dean Brockhausen wrote:
> >> > I have been experiencing problems when I try to put binary data
> >> > via LWP. The problem only manifests it self when I am making a
> >> > HTTPS request . I have hex coded the binary data for easy viewing.
> >> > the buffer that I pass to LWP does not have the utf8 flag on.
> >> >
> >> >
> >> > The data that I put is
> >> >
> >> > a4cb83b4ce7681417e1c425d1a30d02690c0d31e9490c3c708e229eee139 17a3
> >> >
> >> > The data the server gets is :
> >> >
> >> > c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294
> >> >
> >> > The c2..c3..c2......c3 etc makes me think :
> >> >
> >> > Is this my data utf-8 encoded and truncated?
> >> >
> >> > So I take the put data and encode it to utf8
> >> > e.g. $data=$encode("utf8",$data);
> >> >
> >> >
> >> c2a4c38bc283c2b4c38e76c281417e1c425d1a30c39026c290c380c3931e c294c290c383c38708c3a229c3aec3a13917c2a3
> >>
> >> >
> >> >
> >> > Sure enough it is.
> >> >
> >> > I have followed the data all the way to the syswrite on line 292 of
> >> http.pm
> >> > (version # $Id: http.pm,v 1.70 2005/12/08 10:28:01 gisle Exp $)
> >> > and the data is not been encoded at this point. but the utf8 flag has
> >> > changed.
> >> >
> >> >
> >> > at http.pm line 205 $req_buf has the utf8 flag on.
> >> > at http.pm line 233 $$content_ref does not have the utf8 flag.
> >> > the cancatenation at lin 234 gives $buf the utf8 flag.
> >> >
> >> > this is the same behavior between http and https requests.
> >> > Finaly ethereal shows that the http data has not been encoded.
> >> > While I cannot look at the wire in the https case the data at the
> >> > other end has been utf8 encoded and truncated.
> >> > the truncation seems to be from the issue described in this thread
> >> > http://www.mail-archive.com/libwww@perl.org/msg06097.html
> >> >
> >> >
> >> >
> >> > if I nix the utf8 flag the binary data does not get encoded and
> >> > every thing works.
> >> > e.g. (line 233 ...)
> >> >
> >> > if ($req_buf) {
> >> > Encode::_utf8_off($req_buf);# DMB
> >> > my $buf = $req_buf . $$content_ref;
> >> > $wbuf = \$buf;
> >> > }
> >> >
> >> > I may be doing something wrong, but I think it is a bug somewhere
> >> > under syswrite
> >> > to encode the data that was not utf8 when it came from the user.
> >> > Any help would be appreciated, though for now I will continue to nix
> >> > the utf8 flag.
> >> >
> >> > Dean
> >> >
> >>
> >> You didn't post a small working example of your code and data, so it's
> >> impossible to test this problem. All I can do is offer general advice
> >> such as to uuencode or base64-encode the data before sending (if you
> >> have control over both scripts).
> >>
> >> Perhaps you should create a small script (not your entire program) that
> >> demonstrates the problem.
> >>
> >>
> >>
> >
> > Thanks for the reply
> > Here is the requested script.
> >
> > require LWP::UserAgent;
> > use Encode;
> > $agent=new LWP::UserAgent;
> >
> > my $key="httpspublic/test";
> > my $protocol = 'https';
> > #my $protocol = 'http';
> > my $url = "$protocol://s3.amazonaws.com:443/$key";
> > my
> > $content=pack('H*','a4cb83b4ce7681417e1c425d1a30d02690c0d31e 9490c3c708e229eee13917a3');
>
> >
> > print unpack('H*',$content),"\n";
> > my @headers=();;
> >
> > Encode::_utf8_on($url);
> > my $request = HTTP::Request->new('PUT', $url, \@headers);
> > $request->content($content);
> > my $response= $agent->request($request);
> >
> >
> > $request = HTTP::Request->new('GET', $url, \@headers);
> > $response= $agent->request($request);
> >
> > $content1=$response->content();
> > print unpack('H*',$content1),"\n";
> >
> >
> > The problem is that the utf8 flag is true for the url but not the data.
> > when the 2 are concatenated, the data inherits the flag.
> >
> >
> >
>
> What is https://s3.amazonaws.com:443/httpspublic/test supposed to do?
>
> When I do a GET request, I get this (hexdump output):
>
> 0000000 a4c2 8bc3 83c2 b4c2 8ec3 c276 4181 1c7e
> 0000010 5d42 301a 90c3 c226 c390 c380 1e93 94c2
> 0000020
>
>
>
>