Debugging Tip for the mech FAQ

Debugging Tip for the mech FAQ

am 15.07.2005 00:43:13 von peter.stevens

Hi Andy -

Here is a candidate for the mech FAQ:

I just discovered Tamper, a neat tool for Firefox which helps debugging
mech problems. I think it will also be useful for people trying to
figure out what a particular piece of javascript is doing. (This is
probably old hat for most people out there, but it was news - and
helpful - to me.)

Q: When I request a page, it works with my browser, but when I use mech,
it doesn't. Why not?

A: Everything the web server knows about the client is passed back and
forth using the HTTP protocol. This information includes the URI, GET
and POST parameters, cookies, and other header information. If you can
make mech send exactly the same information, then you should get exactly
the same results. (I say "should" because there may be some timing
issues as well - your mech program will normally run much faster than a
human can click a browser and that may have an impact in some
circumstances).

How do you find out what the browser sent? Well, you could sniff the
network, but much easier is to use Firefox. Get the extension "Tamper".
When activated, it keeps a log of all the http requests and responses,
and allows you to examine the headers, cookies, parameters, etc. This is
particularly useful when trying to figure out what some javascript does,
because you can clearly see the results of script in terms of the
parameters transmitted.

Cheers,

Peter

Re: Debugging Tip for the mech FAQ

am 01.08.2005 19:52:30 von philippe.bruhat

--yrj/dFKFPuw6o+aM
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Le vendredi 15 juillet 2005 =E0 00:43, Peter Stevens =E9crivait:
>=20
> Q: When I request a page, it works with my browser, but when I use mech=
,=20
> it doesn't. Why not?
>=20

A more complicated setup includes using HTTP::Proxy and pointing both
WWW::Mechanize and any browser to a proxy that logs the interesting
bits of the transaction.

An example of such a proxy is included in the HTTP::Proxy distribution
as eg/logger.pl. The version currently on CPAN is outdated, and the
attached one should replace it in the next version (it works with the
version of HTTP::Proxy currently on CPAN).

Here an example of what it finds when you point your browser to
www.google.com when your IP is in France:

$ ./logger.pl peek 'google\.\w+$'

GET http://www.google.com/
302 Found
Content-Type: text/html
Set-Cookie: PREF=3DID=3D50559ac18bae0f57:CR=3D1:TM=3D1119132978:LM=3D=
1119132978:S=3DPx8CAVLCC5FoR1NK; expires=3DSun, 17-Jan-2038 19:14:07 GMT;=
path=3D/; domain=3D.google.com
Location: http://www.google.fr/cxfer?c=3DPREF%3D:TM%3D1119132978:S%3D=
wpjw70CuTrboKsrd&prev=3D/

GET http://www.google.fr/cxfer?c=3DPREF%3D:TM%3D1119132978:S%3Dw pjw70CuTr=
boKsrd&prev=3D/
302 Found
Content-Type: text/html
Set-Cookie: PREF=3DID=3De2b4582bd0c2849e:LD=3Dfr:TM=3D1119132978:LM=3D=
1119132978:S=3DkeTI_KO9ZyhHypD3; expires=3DSun, 17-Jan-2038 19:14:07 GMT;=
path=3D/; domain=3D.google.fr
Location: http://www.google.fr/

GET http://www.google.fr/
Cookie: PREF=3DID=3De2b4582bd0c2849e:LD=3Dfr:TM=3D1119132978:LM=3D11 1=
9132978:S=3DkeTI_KO9ZyhHypD3
200 OK
Content-Type: text/html

This shows how google tries to give you the same exact cookie on all
their local sites.

You can ask to see more headers than the predefined ones by using the
command-line parameter "header". The proxy only reports requests for
data with a text/* Content-Type.

--=20
Philippe "BooK" Bruhat

A wish is only as good as the wisher and what he can achieve.
(Moral from Groo The Wanderer #35 (Ep=
ic))

--yrj/dFKFPuw6o+aM
Content-Type: text/x-perl; charset=us-ascii
Content-Description: a logging proxy
Content-Disposition: attachment; filename="logger.pl"

#!/usr/bin/perl -w
use strict;
use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;
use HTTP::Proxy::BodyFilter::simple;
use CGI::Util qw( unescape );

# get the command-line parameters
my %args = (
peek => [],
header => [],
);
{
my $args = '(' . join( '|', keys %args ) . ')';
for ( my $i = 0 ; $i < @ARGV ; $i += 2 ) {
if ( $ARGV[$i] =~ /$args/o ) {
push @{ $args{$1} }, $ARGV[ $i + 1 ];
splice( @ARGV, $i, 2 );
redo;
}
}
}

# the headers we want to see
my @srv_hdr = (
qw( Content-Type Set-Cookie Set-Cookie2 WWW-Authenticate Location ),
@{ $args{header} }
);
my @clt_hdr =
( qw( Cookie Cookie2 Referer Referrer Authorization ), @{ $args{header} } );

# NOTE: Body request filters always receive the request body in one pass
my $post_filter = HTTP::Proxy::BodyFilter::simple->new(
sub {
my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
print STDOUT "\n", $message->method, " ", $message->uri, "\n";
print_headers( $message, @clt_hdr );

# this is from CGI.pm, method parse_params()
my (@pairs) = split( /[&;]/, $$dataref );
for (@pairs) {
my ( $param, $value ) = split( '=', $_, 2 );
$param = unescape($param);
$value = unescape($value);
printf STDOUT " %-20s => %s\n", $param, $value;
}
}
);

my $get_filter = HTTP::Proxy::HeaderFilter::simple->new(
sub {
my ( $self, $headers, $message ) = @_;
my $req = $message->request;
if ( $req->method ne 'POST' ) {
print STDOUT "\n", $req->method, " ", $req->uri, "\n";
print_headers( $req, @clt_hdr );
}
print STDOUT $message->status_line, "\n";
print_headers( $message, @srv_hdr );
}
);

sub print_headers {
my $message = shift;
for my $h (@_) {
if ( $message->header($h) ) {
print STDOUT " $h: $_\n" for ( $message->header($h) );
}
}
}

# create and start the proxy
my $proxy = HTTP::Proxy->new(@ARGV);

# if we want to look at SOME sites
if (@{$args{peek}}) {
for (@{$args{peek}}) {
$proxy->push_filter(
host => $_,
method => 'POST',
request => $post_filter
);
$proxy->push_filter( host => $_, response => $get_filter );
}
}
# otherwise, peek at all sites
else {
$proxy->push_filter(
method => 'POST',
request => $post_filter
);
$proxy->push_filter( response => $get_filter );
}

$proxy->start;


--yrj/dFKFPuw6o+aM--