how check new URL of redirected page

am 05.12.2007 15:39:46 von zawszedamian_p

I read webpage using HTTP::Response but the page was redirected - how
can I read new url?
I tried HTTP::Response->base() but it returns orginal url.

Thanks

Re: how check new URL of redirected page

am 05.12.2007 16:13:03 von Petr Vileta

zawszedamian_p@gazeta.pl wrote:
> I read webpage using HTTP::Response but the page was redirected - how
> can I read new url?
> I tried HTTP::Response->base() but it returns orginal url.
>
What URL you can read? I can take a look how this page is redirected and where
to find new URL.
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to

Re: how check new URL of redirected page

am 05.12.2007 16:26:22 von Ben Morrow

Quoth zawszedamian_p@gazeta.pl:
>
> I read webpage using HTTP::Response but the page was redirected - how
> can I read new url?
> I tried HTTP::Response->base() but it returns orginal url.

If you mean a proper HTTP redirect rather than an HTML meta-refresh or
something more evil in JavaScript,

$response->header('Location');

However, LWP::UserAgent will follow redirects by default, so unless
you've turned it off this won't help :(. If the page is HTML with a
meta-refresh, you will need to parse it with e.g. HTML::Parser and
extract the elements, and find the one with the refresh in. If
it's using JS, you're out of luck, unless the pages you are working with
have similar pieces of JS every time and you can see how to extract the
URL.

Ben

Re: how check new URL of redirected page

am 05.12.2007 20:22:36 von Charles DeRykus

On Dec 5, 7:26 am, Ben Morrow wrote:
> Quoth zawszedamia...@gazeta.pl:
>
>
>
> > I read webpage using HTTP::Response but the page was redirected - how
> > can I read new url?
> > I tried HTTP::Response->base() but it returns orginal url.
>
> If you mean a proper HTTP redirect rather than an HTML meta-refresh or
> something more evil in JavaScript,
>
> $response->header('Location');
>
> However, LWP::UserAgent will follow redirects by default, so unless
> you've turned it off this won't help :(.
> ...

A possibly more convenient alternative to turning off redirects
entirely is LWP's simple_request which won't follow redirects:

my $resp = $ua->simple_request($request);
if ( $resp->code == 302 ) {
$uri = URI->new($resp->header('Location'));
...

--
Charles DeRykus

Re: how check new URL of redirected page

am 06.12.2007 02:53:01 von Petr Vileta

Ben Morrow wrote:
> Quoth zawszedamian_p@gazeta.pl:
>>
>> I read webpage using HTTP::Response but the page was redirected - how
>> can I read new url?
>> I tried HTTP::Response->base() but it returns orginal url.
>
> If you mean a proper HTTP redirect rather than an HTML meta-refresh or
> something more evil in JavaScript,
>
> $response->header('Location');
>
> However, LWP::UserAgent will follow redirects by default, so unless
> you've turned it off this won't help :(. If the page is HTML with a
> meta-refresh, you will need to parse it with e.g. HTML::Parser and
> extract the elements, and find the one with the refresh in. If
> it's using JS, you're out of luck, unless the pages you are working
> with have similar pieces of JS every time and you can see how to
> extract the URL.
>
> Ben
Sorry Ben, please do not kill me, but HTML::Parser is "too big gun to small
rabbit" :-)
For meta element base redirections is successful some like this

# I precede that html page is in variable $content
$content=~s/^.?().+$/$1/si;
$content=~s/^.+?url=(.+?)[\'\">]

Now $content contain new URL.

--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to

Re: how check new URL of redirected page

am 06.12.2007 15:33:02 von Ted Zlatanov

On Thu, 6 Dec 2007 02:53:01 +0100 "Petr Vileta" wrote:

PV> HTML::Parser is "too big gun to small rabbit" :-) For meta element
PV> base redirections is successful some like this

PV> # I precede that html page is in variable $content
PV> $content=~s/^.?().+$/$1/si;
PV> $content=~s/^.+?url=(.+?)[\'\">]

PV> Now $content contain new URL.

This is like using garrote wire to catch and strangle the rabbit :)

Ted

Re: how check new URL of redirected page

am 06.12.2007 16:17:17 von Ben Morrow

Quoth "Petr Vileta" :
> Sorry Ben, please do not kill me, but HTML::Parser is "too big gun to small
> rabbit" :-)
> For meta element base redirections is successful some like this
>
> # I precede that html page is in variable $content

my $content = <

< META content=10;url=foo http-equiv=refresh>

Hello world!

HTML

> $content=~s/^.?().+$/$1/si;
> $content=~s/^.+?url=(.+?)[\'\">]

This line is not valid Perl.

> Now $content contain new URL.

No, it doesn't.

LWP::UserAgent will parse the section of a text/html document for
you, and return the http-equiv headers in with the real HTTP headers.
For this purpose it uses HTML::HeadParser, which, guess what, is a
subclass of HTML::Parser. This means that a refresh can be detected with

$response->header('refresh');

Ben

Re: how check new URL of redirected page

am 07.12.2007 04:38:10 von Petr Vileta

Ted Zlatanov wrote:
> On Thu, 6 Dec 2007 02:53:01 +0100 "Petr Vileta"
> wrote:
>
>> HTML::Parser is "too big gun to small rabbit" :-) For meta element
>> base redirections is successful some like this
>
>> # I precede that html page is in variable $content
>> $content=~s/^.?().+$/$1/si;
>> $content=~s/^.+?url=(.+?)[\'\">]
>
>> Now $content contain new URL.
>
> This is like using garrote wire to catch and strangle the rabbit :)
>
> Ted
But wire is lighter then a gun :-) Why to load HTML::Parser when I not need it
for other purpose? Say I want to download some files (e.g. ZIP) from web but a
list of files is in text page. Hmm, but this txt page is randomly redirected
from some html homepage. What now? Load Parser? Why? I must use LWP but for
only one redirection I can use regexp.
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to

Re: how check new URL of redirected page

am 07.12.2007 04:41:45 von Petr Vileta

Ben Morrow wrote:
> Quoth "Petr Vileta" :
>> Sorry Ben, please do not kill me, but HTML::Parser is "too big gun
>> to small rabbit" :-)
>> For meta element base redirections is successful some like this
>>
>> # I precede that html page is in variable $content
>
> my $content = < >
>
>
> < META content=10;url=foo http-equiv=refresh>
>
>
> Hello world!
>
>
> HTML
>
>> $content=~s/^.?().+$/$1/si;
>> $content=~s/^.+?url=(.+?)[\'\">]
>
> This line is not valid Perl.
>
Sorry, I hadn't my glasses :-) Should be

$content=~s/^.+?url=(.+?)[\'\">].+$/$1/si;

--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to