Help with Mechanize

am 15.01.2007 21:14:43 von bill

Hello,

I could use some help with Mechanize and Andy Lester recommended I post an
email on the libwww mailing list. I am trying to do what should be a simple
scrape of the us patent and trademark website for bibliographic info that
they post for all patents. Unfortunately I keep getting re-routed to a page
that says

"We are unable to display the requested information. Please note that all
requests must be made using this form."

Do you think I am out of luck or are there some things I can try? The form
that is used to request the patent info does have the following javascript
line:

Basically, I am wondering how the website could know that I am using
mechanize and not internet explorer to enter the info into the fields and
click "submit."

Here is my perl code. Thanks.

#!/usr/local/bin/perl -w

print "Content-type: text/html\n\n";

use strict;

use WWW::Mechanize;

use Crypt::SSLeay;

my $url = "https://ramps.uspto.gov/eram/";

my $maintenancepatent = "5771669";

my $maintenanceapp = "08672157";

my $outfile = "out.htm";

my $mech = WWW::Mechanize->new( autocheck => 1);

$mech->proxy(['https'], '');

$mech->get($url);

$mech->follow_link(text => "Pay or Look up Patent Maintenance Fees", n =>
1);

$mech->form_name('mfInputForm');

$mech->field(patentNum => "$maintenancepatent");

$mech->field(applicationNum => "$maintenanceapp");

$mech->add_header( Referer => $url );

$mech->click_button (number => 2);

open(OUTFILE, ">$outfile");

my $output_page = $mech->content();

print OUTFILE "$output_page";

close(OUTFILE);

print "done";