Is LWP the right module to use?
Is LWP the right module to use?
am 19.03.2005 01:41:36 von awang
------_=_NextPart_001_01C52C1C.68086806
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Greetings!
I'm writing a program which requires me to set some cookies, send some =
requests, and then process the query log generated. I wonder if LWP is =
the right module to accomplish the first two tasks (i.e. set cookies and =
send requests), since I've never used this module before.
I'd appreciate your help!
Anita Wang
Personalization QA
Email: awang@amazon.com
Phone: 206-266-3366
Office: 5202.09
------_=_NextPart_001_01C52C1C.68086806--
Re: Is LWP the right module to use?
am 19.03.2005 02:53:28 von rho
On Fri, Mar 18, 2005 at 04:41:36PM -0800, Wang, Anita wrote:
> I'm writing a program which requires me to set some cookies, send
> some requests, and then process the query log generated. I wonder if
> LWP is the right module to accomplish the first two tasks (i.e. set
> cookies and send requests), since I've never used this module
> before.
Anita,
Your question is maybe a bit too unspecific. Maybe you want to poke
around in the Cookbook to see for yourself?
man lwpcook
There is a section about cookies...
\rho
RE: Is LWP the right module to use?
am 19.03.2005 04:51:16 von bedouglas
anita,
can you be more specfic in exactly what you want to do. lwp, along with
other perl mods can be setup to deal with cookies, and to return information
from a given web page.
-----Original Message-----
From: Robert Barta [mailto:rho@mando]On Behalf Of Robert Barta
Sent: Friday, March 18, 2005 5:53 PM
To: Wang, Anita
Cc: libwww@perl.org
Subject: Re: Is LWP the right module to use?
On Fri, Mar 18, 2005 at 04:41:36PM -0800, Wang, Anita wrote:
> I'm writing a program which requires me to set some cookies, send
> some requests, and then process the query log generated. I wonder if
> LWP is the right module to accomplish the first two tasks (i.e. set
> cookies and send requests), since I've never used this module
> before.
Anita,
Your question is maybe a bit too unspecific. Maybe you want to poke
around in the Cookbook to see for yourself?
man lwpcook
There is a section about cookies...
\rho
Re: Is LWP the right module to use?
am 19.03.2005 04:59:22 von Andy
> Your question is maybe a bit too unspecific. Maybe you want to poke
> around in the Cookbook to see for yourself?
>
> man lwpcook
>
> There is a section about cookies...
See also WWW::Mechanize.
--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
RE: Is LWP the right module to use?
am 19.03.2005 18:12:15 von Andrew.Johnson
Here's some code for you. It doesn't do any form input, and you might
consider making it more friendly to the webserver with some sleep lines,
depending on who you're scraping.
use strict;
#use warnings;
use LWP::UserAgent;
use HTML::TokeParser;
use HTTP::COOKIES::Netscape;
############################################
my @keys =3D ('Bush',
'"John Kerry"');
#THIS WORKS IF YOU COLLECT PERSISTENT COOKIES IN A MOZILLA/NETSCAPE =
WINDOWS
BROWSING SESSION
my $browser=3DLWP::UserAgent->new();
my $cookie_jar =3D HTTP::Cookies::Netscape->new(file =3D>
"C:/PATH/cookies.txt");
$browser->cookie_jar($cookie_jar);
###########################################
GetSearchPages();
GetArticleURLs('urls.txt');
GetDeepArticles('articles.txt');
###########################################
sub GetSearchPages
{
=09
open (OUTFILE, ">urls.txt");
foreach my $key (@keys)
{
my
$url=3D"http://search.businessweek.com/Search?searchTerm=3D$ key&skin=3DBu=
sinessWee
k&x=3D9&y=3D5";
print OUTFILE "$url\n";
=09
while($url=3DCheckForNext($url))
{=09
print OUTFILE "$url\n";
}
=09
}
close OUTFILE;
}
sub CheckForNext
{
=09
my $response=3D$browser->get ("$_[0]");
my $content =3D $response->content;
my $stream =3D HTML::TokeParser->new(\$content)
|| die "Coulnd't read HTML $content BLAH BLAH LAH";
my $flag=3D0;
=09
while(my $token=3D$stream->get_token)
{
if ($token->[0] eq 'T')
{
if ($token->[1]=3D~/page:/)
{
$flag=3D1;
}
}
=09
if ($flag==1)
{=09
if ($token->[0] eq 'S')
{
my $remember=3D$token->[4];=20
$token=3D$stream->get_token;
if ($token->[1] =3D~/Next/)
{ =20
=09
(my $crap, my $url,my $crap2) =3D
split(/'/,$remember);=20
$url =3D~ s/&/&/g;
$url =3D
URI->new_abs($url,'http://search.businessweek.com/')->canoni cal;
return $url;
}
} =09
} =20
}
return ""; =09
}
sub GetArticleURLs
{
open (URLS, "$_[0]");
open (OUTFILE, ">articles.txt");
while ()
{=09
my $flag=3D0;
my $response=3D$browser->get("$_");
my $content =3D $response->content;
my $stream =3D HTML::TokeParser->new(\$content)
|| die "Coulnd't read HTML $content BLAH BLAH LAH";
while(my $token=3D$stream->get_token)
{ =20
if ($token->[0] eq 'T')
{
if ($token->[1]=3D~ /BUSINESSWEEK RESULTS/)
{ =20
$flag=3D1;
}
}
=20
if ($token->[0] eq 'T')
{
if ($token->[1]=3D~ /Result page/)
{
$flag=3D0;
}
}
=20
if ($flag==1)
{=09
if ($token->[0] eq 'S')
{
=09
if ($token->[4] =3D~/href/)
{ =20
(my $crap, my $url,my
$crap2) =3D split(/'/,$token->[4]);
if ($url =3D~
/AdvancedSearch\?searchTerm/)
{
$url=3D"";
}
if ($url)
{
print OUTFILE
"$url\n";}
}
} =09
} =20
} =09
}
close OUTFILE;
close URLS;
}
sub GetDeepArticles
{
open (ARTICLES, "$_[0]");
open (OUTFILE, ">articles_deep.txt");
while ()
{
print OUTFILE $_;
my $response=3D$browser->get ("$_");
my $content =3D $response->content;
my $stream =3D HTML::TokeParser->new(\$content)
|| die "Coulnd't read HTML $content BLAH BLAH LAH";
my $flag=3D0;
=09
while(my $token=3D$stream->get_token)
{
if ($token->[0] eq 'T')
{
if ($token->[1]=3D~/Continued on/)
{
$flag=3D1;
do
{
$token=3D$stream->get_token;
}until ($token->[4] =3D~ /href/); =20
}
=09
}
=09
if ($flag==1)
{ =09
(my $crap, my $url,my $crap2) =3D
split(/\"/,$token->[4]);
$url =3D
URI->new_abs($url,'http://www.businessweek.com/')->canonical ; =20
=09
if ($url)
{=09
print OUTFILE "$url\n";
$response=3D$browser->get ("$url");
$content =3D $response->content;
$stream =3D
HTML::TokeParser->new(\$content)
|| die "Coulnd't read HTML $content
BLAH BLAH LAH";
}
$flag=3D0;
} =20
}
}
close OUTFILE;
close ARTICLES;
}
-----Original Message-----
From: Wang, Anita [mailto:awang@amazon.com]
Sent: Fri 3/18/2005 7:41 PM
To: libwww@perl.org
Subject: Is LWP the right module to use?
=20
Greetings!
I'm writing a program which requires me to set some cookies, send some
requests, and then process the query log generated. I wonder if LWP is =
the
right module to accomplish the first two tasks (i.e. set cookies and =
send
requests), since I've never used this module before.
I'd appreciate your help!
Anita Wang
Personalization QA
Email: awang@amazon.com
Phone: 206-266-3366
Office: 5202.09