Authentication problem?

Authentication problem?

am 12.03.2005 22:18:36 von Andrew.Johnson

I've been wrestling with a script to scrape some information off of
BusinessWeek.com for a while now. But I've run into problems trying to
authenticate my agent to the businessweek server.

My script pulls a list of URLs by running some searches. Some of the =
URLs
resulting from the searches kick you back to a registration page if you =
are
not authenticated (I thought via HTTP authentication).

If my program worked (that is my agent authenticated itself), the agent
would GET:

http://www.businessweek.com/cgi-bin/register/archiveSearch.c gi?h=3D03_47/=
b3859
655.htm

and be redirected to:

http://www.businessweek.com/@@KA8WaYYQAhrvjBkA/magazine/cont ent/03_39/b38=
516
17.htm.

Here's what I use to try and authenticate. What else could I try? Maybe =
I
don't have the realm right?=20

my $response;
my $browser=3DLWP::UserAgent->new();
$browser->cookie_jar({});
$browser->agent('Mozilla/6.0 [en] (WinXP; U)');
print $browser->credentials('www-secure.businessweek.com:80',
'viewing Business Week Online',
'user' =3D> 'password'
);

Re: Authentication problem?

am 14.03.2005 10:15:18 von jjl

On Sat, 12 Mar 2005, Andrew Johnson wrote:
> I've been wrestling with a script to scrape some information off of
[...]
> What else could I try?
[...]

Hi Andrew

Read some past messages on this list from me. I think I've made the same
guesses about fifty times now ;-/ and most of the debugging hints are
always taken from the same fairly small set. Feel free to come back if
you've tried those and are still stuck, of course!

IIRC this list is on gmane now, so it should be easy to search.

Does this list have a FAQ, anybody?

My own FAQs (Python, not Perl, but that's not really all *that* relevant):

http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
http://wwwsearch.sourceforge.net/ClientCookie/doc.html#debug ging

Other standard responses:

1. use WWW::Mechanize

2. perdoc lwpcook


John

RE: Authentication problem?

am 14.03.2005 21:32:03 von Andrew.Johnson

When I look at the headers I'm getting back from the BusinessWeek =
server,
they look like as follows. Is there a special way I need to use to deal =
with
CGI authentication, since HTTP authentication via the LWP credentials =
method
doesn't work?

Connection: close
Date: Mon, 14 Mar 2005 20:27:59 GMT
Server: Netscape-Enterprise/3.6 SP3
Content-Type: text/html; charset=3DISO-8859-1
Client-Date: Mon, 14 Mar 2005 20:27:55 GMT
Client-Peer: 164.109.22.105:80
Client-Response-Num: 1
Link: ;
rel=3D"stylesheet"; type=3D"text/css"
Set-Cookie:
Transact=3DHa786ec07e744c6efd8cdd1df37580a9c:session_id=3D8f 51eaf294c711d=
98900e4
d5b08dcaef&kid=3D310001.100170&ss=3Denv; Path=3D/
Title: BusinessWeek Online
XXX-Authenticate: CGIPassword


( Andrew Johnson )=20
) Marketing Writer (
( Elias/Savion Advertising )
( Phone: 412.642.7700 Fax 412.642.2277 )
) www.elias-savion.com (
( andrew.johnson@elias-savion.com )
=20

-----Original Message-----
From: John J Lee [mailto:jjl@pobox.com]=20
Sent: Monday, March 14, 2005 4:15 AM
To: libwww@perl.org
Subject: Re: Authentication problem?

On Sat, 12 Mar 2005, Andrew Johnson wrote:
> I've been wrestling with a script to scrape some information off of
[...]
> What else could I try?
[...]

Hi Andrew

Read some past messages on this list from me. I think I've made the =
same
guesses about fifty times now ;-/ and most of the debugging hints are
always taken from the same fairly small set. Feel free to come back if
you've tried those and are still stuck, of course!

IIRC this list is on gmane now, so it should be easy to search.

Does this list have a FAQ, anybody?

My own FAQs (Python, not Perl, but that's not really all *that* =
relevant):

http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
http://wwwsearch.sourceforge.net/ClientCookie/doc.html#debug ging

Other standard responses:

1. use WWW::Mechanize

2. perdoc lwpcook


John

RE: Authentication problem?

am 14.03.2005 22:18:09 von Andrew.Johnson

I solved my problem by loading the PROPER cookie file. I was failing =
before
because I was loading a cookie file in from a session in which I had not
checked an auto-login box. By creating a new cookie file from a browsing
session where I did check the box, I am able to access protected =
documents.


( Andrew Johnson )=20
) Marketing Writer (
( Elias/Savion Advertising )
( Phone: 412.642.7700 Fax 412.642.2277 )
) www.elias-savion.com (
( andrew.johnson@elias-savion.com )
=20

-----Original Message-----
From: John J Lee [mailto:jjl@pobox.com]=20
Sent: Monday, March 14, 2005 4:15 AM
To: libwww@perl.org
Subject: Re: Authentication problem?

On Sat, 12 Mar 2005, Andrew Johnson wrote:
> I've been wrestling with a script to scrape some information off of
[...]
> What else could I try?
[...]

Hi Andrew

Read some past messages on this list from me. I think I've made the =
same
guesses about fifty times now ;-/ and most of the debugging hints are
always taken from the same fairly small set. Feel free to come back if
you've tried those and are still stuck, of course!

IIRC this list is on gmane now, so it should be easy to search.

Does this list have a FAQ, anybody?

My own FAQs (Python, not Perl, but that's not really all *that* =
relevant):

http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
http://wwwsearch.sourceforge.net/ClientCookie/doc.html#debug ging

Other standard responses:

1. use WWW::Mechanize

2. perdoc lwpcook


John