Waiting for resultpage...

am 09.06.2005 15:52:15 von ChaosMK2

Hello,

I am working with WWW::Mechanize in order to contact a page that
processes a request for me. My problem is that the final result page
needs 2-3 minutes on that server to be calculated. Inbetween another not
important page is shown that indicates that the server is working on my
request.
My problem is how to ignore that intermediate page but get that last
important resultpage.

Thats the code:

my $mech = WWW::Mechanize->new();
$mech->get($url3);
my $upload = File::Spec->catfile($dir, "Temp");
if($mech->success())
{
$mech->form_number(1);
$mech->set_fields(email => "", -uploaded_file => $upload,
-case => "upper", -seqnos => "off", -outorder => "input",);
$mech->tick(-in => "Mlalign_id_pair");
$mech->tick(-in => "Mfast_pair");
$mech->tick(-in => "Mclustalw_aln");
$mech->tick(-output => "fasta_aln");
$mech->tick(-output => "score_html");
$mech->submit();
if($mech->success())
{
open(FILE, ">", "TCoffee_$multizFile.html");
print FILE $mech->content();
close FILE;
}
}

$mech-success and then $mech->content() prints just that intermediate
page that I want to ignore.

Thank you for help.

Sebastian

Re: Waiting for resultpage...

am 12.06.2005 04:05:07 von m-s-w-www.evite.com

ChaosMK2 wrote:
> Inbetween another not important page is shown
> that indicates that the server is working on
> my request. My problem is how to ignore that
> intermediate page but get that last important
> resultpage.

You'll need to look at how the intermediate page works. Chances are that
it either uses javascript or a meta tag to do the redirect/refresh every
so often and once the result is ready that URL gives real results instead
of another intermediate page.

You'll want to parse out that redirect/refresh target (or, assuming the
URL is static, just get it off the mech object) and keep trying that URL
until the content no longer resembles the intermediate page (presumably
with a polite sleep() between requests). Once the content no longer looks
like the intermediate page, it should hopefully be your final results.

-matt

Re: Waiting for resultpage...

am 12.06.2005 14:28:24 von ChaosMK2

matthew wickline wrote:

>ChaosMK2 wrote:
>
>
>>Inbetween another not important page is shown
>>that indicates that the server is working on
>>my request. My problem is how to ignore that
>>intermediate page but get that last important
>>resultpage.
>>
>>
>
>
>You'll need to look at how the intermediate page works. Chances are that
>it either uses javascript or a meta tag to do the redirect/refresh every
>so often and once the result is ready that URL gives real results instead
>of another intermediate page.
>
>You'll want to parse out that redirect/refresh target (or, assuming the
>URL is static, just get it off the mech object) and keep trying that URL
>until the content no longer resembles the intermediate page (presumably
>with a polite sleep() between requests). Once the content no longer looks
>like the intermediate page, it should hopefully be your final results.
>
>-matt
>
>
>
>
Thank you very much for your answer Matt but the problem persists. Here
come the three intermediate pages:

Page1:

PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
lang="en-US">Tcoffee monitoring
Processing, please wait...

Page2:

PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
lang="en-US">Tcoffee monitoring
Processing, please wait....

src=/Tcoffee/Images/l5.gif>time:13 seconds

Page3:

PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
lang="en-US">Tcoffee monitoring
Your job is finished

The code that I have added following your instructions:

while($mech->success() and $mech->title() eq "Tcoffee
monitoring")
{
print $mech->title(), "\n";
print $mech->uri(), "\n";
sleep(10);
}
if($mech->success())
{
open(FILE, ">", "TCoffee_$multizFile.html");
print FILE $mech->content();
close FILE;
}

The problem remais that $mech stores just the first fetched response it
gets. I don't know how to ignore it... As you see there are no liks or
forms on the intermediate pages. Nethertheless thank you very much for
your efforts. Maybe it is designed that way in order to avoid scripts
that the server is contacted by scripts... I have tried it also with a
module from CPAN that coded the same algorithm as the page offers but
failed there too. Simingly there are still bugs in that module and it is
only UNIX compatible...

Sebastian