How to handle the Redirected using Scrappy Module

How to handle the Redirected using Scrappy Module

am 24.05.2011 15:25:53 von muthukumar swamy

I am try to crawl a webpage that one is redirected to another.
I am using Scrappy module for crawling process.
I am using version 0.94111370 (Updated version).
Any one suggest me to handle the Redirect.

thank you,
Muthukumaraswamy.C (Ambuli)


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: How to handle the Redirected using Scrappy Module

am 31.05.2011 11:05:41 von Chris Nehren

On Tue, May 24, 2011 at 06:25:53 -0700 , Ambuli wrote:
> I am try to crawl a webpage that one is redirected to another.
> I am using Scrappy module for crawling process.
> I am using version 0.94111370 (Updated version).
> Any one suggest me to handle the Redirect.

What do you mean by "handle the Redirect"? I'm afraid your question
isn't clear.

--
Chris Nehren | Coder, Sysadmin, Masochist
Shadowcat Systems Ltd. | http://shadowcat.co.uk/

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: How to handle the Redirected using Scrappy Module

am 31.05.2011 11:14:36 von Chris Nehren

<44b8cc1f-bef8-494a-ad15-45e79d3ef43a@s16g2000prf.googlegroups.com>

On Tue, May 24, 2011 at 06:25:53 -0700 , Ambuli wrote:
> I am try to crawl a webpage that one is redirected to another.
> I am using Scrappy module for crawling process.
> I am using version 0.94111370 (Updated version).
> Any one suggest me to handle the Redirect.

What do you mean by 'handle the Redirect'? Your message isn't clear.

--
Chris Nehren | Coder, Sysadmin, Masochist
Shadowcat Systems Ltd. | http://shadowcat.co.uk/

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: How to handle the Redirected using Scrappy Module

am 31.05.2011 16:02:17 von muthukumar swamy

Hi Chris Nehren,
I show my code to clear my thought.

my $scraper = Scrappy->new;
$new_url="Some Url";
$scraper->get($new_url)
if ($scraper->page_status == 302)
{
# Here i want to get the redirect Location
}

Give some suggestion for me


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: How to handle the Redirected using Scrappy Module

am 01.06.2011 03:41:09 von John SJ Anderson

On Tue, May 31, 2011 at 05:05, Chris Nehren
wrote:
> On Tue, May 24, 2011 at 06:25:53 -0700 , Ambuli wrote:
>> I am try to crawl a webpage that one is redirected to another.
>> I am using Scrappy module for crawling process.
>> I am using version 0.94111370 (Updated version).
>> Any one suggest me to handle the Redirect.
>
> What do you mean by "handle the Redirect"? I'm afraid your question
> isn't clear.
>

I'm assuming that the OP wants to know whether the web request was
redirected via a 301 or a 302...

It looks like Scrappy handles such redirects transparently, but
provides the 'request_denied' method as a flag that can be checked.
Here's some sample code that uses a page on one of my domains that
gives a 301:


--cut--
#! /opt/perl/bin/perl

use strict;
use warnings;
use 5.010;

use Scrappy;

my $s = Scrappy->new;
$s->get( 'http://genehack.org/about' );

say "Status: ",$s->page_status;
say "Denied: ",$s->request_denied;

my @redirects = $s->response->redirects;
say "Original URL: ", $redirects[0]->request->url;
say "Fetched URL: ",$s->response->request->url;
--cut--

Running this produces:

$ ./try.pl
Status: 200
Denied: 1
Original URL: http://genehack.org/about
Fetched URL: http://genehack.net/about/

As you can see, the status code is reported as a 200, even though
there was a redirect done.

The 'request' method on the Scrappy object returns an HTTP::Response
object. You should read the documentation for that module to
understand what the last several lines in my script are doing. You'll
need to understand that in order to be able to reliably detect
redirects yourself.

chrs,
john.

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/