LWP ends up with unreadable characters
am 24.04.2006 00:25:54 von tom arnall
i'm trying to use LWP on:
http://education.yahoo.com/reference/dict_en_es/spanish/a_1; _ylt=
AoFfUtrOQo3d1vl10ohvPPb2s8sF
when i do :
$ua = LWP::UserAgent->new;
$res1 = $ua->get($url,%header);
my $page = $res1->content;
'$page' ends up with unreadable characters. the code works fine for
most sites. also, if i fetch the page with 'lynx' i get readable stuff,
and a browser's 'view source' function on the page gets a normal
result.
ideas?
tom arnall
north spit, ca
Re: LWP ends up with unreadable characters
am 25.04.2006 16:17:39 von gisle
kloro@cox.net writes:
> i'm trying to use LWP on:
>
> http://education.yahoo.com/reference/dict_en_es/spanish/a_1; _ylt=
> AoFfUtrOQo3d1vl10ohvPPb2s8sF
>
> when i do :
>
> $ua = LWP::UserAgent->new;
> $res1 = $ua->get($url,%header);
> my $page = $res1->content;
>
> '$page' ends up with unreadable characters. the code works fine for
> most sites. also, if i fetch the page with 'lynx' i get readable stuff,
> and a browser's 'view source' function on the page gets a normal
> result.
>
> ideas?
Try to provide a complete program that we can run to reproduce your
problem. I certainly get text out when I try to access your URL with
this program:
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $res = $ua->get('http://education.yahoo.com/reference/dict_en_es/sp anish/a_1;_ylt=AoFfUtrOQo3d1vl10ohvPPb2s8sF');
my $page = $res->content;
print $page;
__END__
Perhaps you have something interesting in %header that you don't tell us about?
Regards,
Gisle
Re: LWP ends up with unreadable characters
am 28.04.2006 06:02:49 von kloro
On Tuesday 25 April 2006 07:17 am, you wrote:
> kloro@cox.net writes:
> > i'm trying to use LWP on:
> >
> > http://education.yahoo.com/reference/dict_en_es/spanish/a_1; _ylt=3D
> > AoFfUtrOQo3d1vl10ohvPPb2s8sF
> >
> > when i do :
> >
> > $ua =3D LWP::UserAgent->new;
> > $res1 =3D $ua->get($url,%header);
> > my $page =3D $res1->content;
> >
> > '$page' ends up with unreadable characters. the code works fine for
> > most sites. also, if i fetch the page with 'lynx' i get readable stuff,
> > and a browser's 'view source' function on the page gets a normal
> > result.
> >
> > ideas?
>
> Try to provide a complete program that we can run to reproduce your
> problem. I certainly get text out when I try to access your URL with
> this program:
>
> #!/usr/bin/perl -w
>
> use strict;
> use LWP::UserAgent;
>
> my $ua =3D LWP::UserAgent->new;
> my $res =3D
> $ua->get('http://education.yahoo.com/reference/dict_en_es/sp anish/a_1;_yl=
t=3D
>AoFfUtrOQo3d1vl10ohvPPb2s8sF'); my $page =3D $res->content;
>
> print $page;
> __END__
>
> Perhaps you have something interesting in %header that you don't tell us
> about?
>
thanks everyone for your responses. and indeed it has to do with the '%head=
er'=20
statement, which runs:
my %header =3D (
=A0 =A0 'Keep-Alive' =3D> '300',
=A0 =A0 'Connection' =3D> 'keep-alive',
=A0 =A0 'User-Agent' =3D> 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.=
10)
=A0 =A0 =A0 =A0 Gecko/20050925 Firefox/1.0.4 (Debian package 1.0.4-2sarge5=
)',
=A0 =A0 'Pragma' =3D> 'no-cache',
=A0 =A0 'Cache-control' =3D> 'no-cache',
=A0 =A0 'Accept' =3D> 'image/png,*/*;q=3D0.5',
=A0 =A0 'Accept-Encoding' =3D> 'gzip,deflate',
=A0 =A0 'Accept-Charset' =3D> 'ISO-8859-1,utf-8;q=3D0.7,*;q=3D0.7',
=A0 =A0 'Accept-Language' =3D> 'en-us,en;q=3D0.5',
=A0 =A0 'Host' =3D> $host,
=A0 =A0 );
=09
if " 'Accept-Encoding' =3D> 'gzip,deflate' " is eliminated, the subsequ=
ent=20
fetch on the website is normal ascii.
tom arnall
north spit, ca