LWP: Warning with utf8 data in HTML head section
am 03.08.2006 00:47:56 von libwwwThere seems to be a bug in LWP which causes a warning in
HTML::HeadParser on fetched web documents which contain utf-8 encoded
data in the header section.
Example:
use strict;
use LWP;
use 5.008;
my $url = 'http://perlmeister.com/test/utf8.html';
my $ua = LWP::UserAgent->new();
my $res = $ua->get($url);
This snippet shows the warning
Parsing of undecoded UTF-8 will give garbage when decoding
entities at /home/y/lib/perl5/site_perl/5.8/LWP/Protocol.pm line
114.
with LWP-5.805 and HTML-Parser-3.55.
HTML::HeadParser issues this warning if it finds UTF-8 encoded data
but the string handed in doesn't have the utf-8 bit set.
Setting the utf-8 bit on web server responses which indicate
UTF-8 content in a content header like 'text/html; charset=utf-8'
seems to be one possible solution, but this header setting might also
occur in the HTML header section, which HTML::HeadParser is supposed
to parse:
in which case the warning probably needs to be suppressed until
HTML::HeadParser is done and has verified that there's no such setting
in the HTML head.
-- Mike
Mike Schilli
m@perlmeister.com