LWP produces unreadable characters

LWP produces unreadable characters

am 27.04.2006 04:32:28 von kloro

i am trying to use LWP to fetch material from the following site:

http://education.yahoo.com/reference

using, to pick an example:

http://education.yahoo.com/reference/dict_en_es/spanish/
curia;_ylt=A9FJq_FmyklEIWYAkAD2s8sF

the result is unreadable characters, tho' lynx does fine with it (tho', of
course, it doesn't return a true copy of the html), and the stuff looks fine
when i use the 'view source' function on the page displayed in my browser
window.

my code:

$host = 'education.yahoo.com';
$base = 'reference/dict_en_es/spanish_index';
$file = 'B;_ylt=Ag1EmmqDVPhfK7pmll28XUX2s8sF';

%header = (
'Keep-Alive' => '300',
'Connection' => 'keep-alive',
'User-Agent' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10)
Gecko/20050925 Firefox/1.0.4 (Debian package 1.0.4-2sarge5)',
'Pragma' => 'no-cache',
'Cache-control' => 'no-cache',
'Accept' => 'image/png,*/*;q=0.5',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language' => 'en-us,en;q=0.5',
'Host' => $host,
);

$f="http://$host/$base/$url";
$res1 = $ua->get($f,%header);
my $page = $res1->content;

'$page' ends up with unreadable characters.

thanks,

tom arnall
north spit, ca

Re: LWP produces unreadable characters

am 28.04.2006 00:18:04 von tallwine

tom arnall wrote:
> i am trying to use LWP to fetch material from the following site:
>
> http://education.yahoo.com/reference
>
> using, to pick an example:
>
> http://education.yahoo.com/reference/dict_en_es/spanish/
> curia;_ylt=A9FJq_FmyklEIWYAkAD2s8sF
>
> the result is unreadable characters, tho' lynx does fine with it (tho', of
> course, it doesn't return a true copy of the html), and the stuff looks fine
> when i use the 'view source' function on the page displayed in my browser
> window.
>
> my code:
>
> $host = 'education.yahoo.com';
> $base = 'reference/dict_en_es/spanish_index';
> $file = 'B;_ylt=Ag1EmmqDVPhfK7pmll28XUX2s8sF';

After a few minor adjustments, it works for me.

use strict;
use LWP;
use URI;

my $host = 'http://education.yahoo.com';
my $path =
'/reference/dict_en_es/spanish_index/B;_ylt=Ag1EmmqDVPhfK7pm ll28XUX2s8sF';
my $uri = URI->new_abs($path,$host);

# I like a uri object as I often run in a loop and change query strings
# host etc.

my %header = (
'Keep-Alive' => '300',
'Connection' => 'keep-alive',
'User-Agent' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10)
Gecko/20050925 Firefox/1.0.4 (Debian package 1.0.4-2sarge5)',
'Pragma' => 'no-cache',
'Cache-control' => 'no-cache',
'Accept' => 'image/png,*/*;q=0.5',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language' => 'en-us,en;q=0.5',
'Host' => $host,
);

my $ua = LWP::UserAgent->new(%header);

my $res1 = $ua->get($uri);
my $page = $res1->content;
print $page;


-Tim