LWP produces unreadable characters
am 27.04.2006 04:32:28 von kloro
i am trying to use LWP to fetch material from the following site:
http://education.yahoo.com/reference
using, to pick an example:
http://education.yahoo.com/reference/dict_en_es/spanish/
curia;_ylt=A9FJq_FmyklEIWYAkAD2s8sF
the result is unreadable characters, tho' lynx does fine with it (tho', of
course, it doesn't return a true copy of the html), and the stuff looks fine
when i use the 'view source' function on the page displayed in my browser
window.
my code:
$host = 'education.yahoo.com';
$base = 'reference/dict_en_es/spanish_index';
$file = 'B;_ylt=Ag1EmmqDVPhfK7pmll28XUX2s8sF';
%header = (
'Keep-Alive' => '300',
'Connection' => 'keep-alive',
'User-Agent' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10)
Gecko/20050925 Firefox/1.0.4 (Debian package 1.0.4-2sarge5)',
'Pragma' => 'no-cache',
'Cache-control' => 'no-cache',
'Accept' => 'image/png,*/*;q=0.5',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language' => 'en-us,en;q=0.5',
'Host' => $host,
);
$f="http://$host/$base/$url";
$res1 = $ua->get($f,%header);
my $page = $res1->content;
'$page' ends up with unreadable characters.
thanks,
tom arnall
north spit, ca
Re: LWP produces unreadable characters
am 28.04.2006 00:18:04 von tallwine
tom arnall wrote:
> i am trying to use LWP to fetch material from the following site:
>
> http://education.yahoo.com/reference
>
> using, to pick an example:
>
> http://education.yahoo.com/reference/dict_en_es/spanish/
> curia;_ylt=A9FJq_FmyklEIWYAkAD2s8sF
>
> the result is unreadable characters, tho' lynx does fine with it (tho', of
> course, it doesn't return a true copy of the html), and the stuff looks fine
> when i use the 'view source' function on the page displayed in my browser
> window.
>
> my code:
>
> $host = 'education.yahoo.com';
> $base = 'reference/dict_en_es/spanish_index';
> $file = 'B;_ylt=Ag1EmmqDVPhfK7pmll28XUX2s8sF';
After a few minor adjustments, it works for me.
use strict;
use LWP;
use URI;
my $host = 'http://education.yahoo.com';
my $path =
'/reference/dict_en_es/spanish_index/B;_ylt=Ag1EmmqDVPhfK7pm ll28XUX2s8sF';
my $uri = URI->new_abs($path,$host);
# I like a uri object as I often run in a loop and change query strings
# host etc.
my %header = (
'Keep-Alive' => '300',
'Connection' => 'keep-alive',
'User-Agent' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10)
Gecko/20050925 Firefox/1.0.4 (Debian package 1.0.4-2sarge5)',
'Pragma' => 'no-cache',
'Cache-control' => 'no-cache',
'Accept' => 'image/png,*/*;q=0.5',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language' => 'en-us,en;q=0.5',
'Host' => $host,
);
my $ua = LWP::UserAgent->new(%header);
my $res1 = $ua->get($uri);
my $page = $res1->content;
print $page;
-Tim