Help with Mechanize
am 15.01.2007 21:14:43 von billHello,
I could use some help with Mechanize and Andy Lester recommended I post an
email on the libwww mailing list. I am trying to do what should be a simple
scrape of the us patent and trademark website for bibliographic info that
they post for all patents. Unfortunately I keep getting re-routed to a page
that says
"We are unable to display the requested information. Please note that all
requests must be made using this form."
Do you think I am out of luck or are there some things I can try? The form
that is used to request the patent info does have the following javascript
line:
Basically, I am wondering how the website could know that I am using
mechanize and not internet explorer to enter the info into the fields and
click "submit."
Here is my perl code. Thanks.
#!/usr/local/bin/perl -w
print "Content-type: text/html\n\n";
use strict;
use WWW::Mechanize;
use Crypt::SSLeay;
my $url = "https://ramps.uspto.gov/eram/";
my $maintenancepatent = "5771669";
my $maintenanceapp = "08672157";
my $outfile = "out.htm";
my $mech = WWW::Mechanize->new( autocheck => 1);
$mech->proxy(['https'], '');
$mech->get($url);
$mech->follow_link(text => "Pay or Look up Patent Maintenance Fees", n =>
1);
$mech->form_name('mfInputForm');
$mech->field(patentNum => "$maintenancepatent");
$mech->field(applicationNum => "$maintenanceapp");
$mech->add_header( Referer => $url );
$mech->click_button (number => 2);
open(OUTFILE, ">$outfile");
my $output_page = $mech->content();
print OUTFILE "$output_page";
close(OUTFILE);
print "done";