Request change in LWP::UserAgent
am 26.04.2006 16:47:13 von j_and_tHello,
I have a problem with the "$ua->max_size()". This can really choke you in
some cases. It seems when LWP makes this type of request it is sending a
Range request. Some servers are super slow at responding to this type of
request and often return a 206 Partial Content response. This is sometimes
replied with a "Content-Type: multipart/mixed" and a
boundary="--bla,bla,bla". This now makes it really difficult to figure out
what the content is (ie, text/html, image/gif and so on) so a lot more
processing is required to figure out what the content is and whether or not
it is acceptable. For example;
<--snip-->
my $url = 'http://search.cpan.org/';
my $max_content = 500;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->max_size($max_content);
my $response = $ua->get($url);
<--snip-->
I'm sorry I have not included a URL where all this trouble is found, but
that's because I stopped using the $ua->max_size(); some time ago, but now I
have a need for it. The problem is that some servers will take forever to
respond to this request and will often cause the above problems mentioned.
My solution to this was to create a callback instead:
<--snip-->
my $result = '';
my $url = 'http://search.cpan.org/';
my $max_content = 500;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
my $response = $ua->get($url, ':content_cb' => \&http_callback,
$max_content+1);
sub http_callback {
my ($data, $response, $protocol) = @_;
$result .= $data;
die if length($result > $max_content);
return();
}
<--snip-->
While this is not prefect, it did solve all the above issues. Servers
respond super fast and the content-type headers are untouched (ie,
text/html).
My request to you is to change the way "$ua->max_size($max_content);" works.
It would benefit me and I'm sure many others if it worked more like the
callback shown above (just stop download at (x)bytes). This would then act
more like a browser acts when you click the Stop button. Requests would be
fast and the server will reply with all header information as expected. And
this will allow us to use "LWP::Parallel::RobotUA" which my above example
will not.
So why is "$ua->max_size($max_content);" so useful? Well some people like to
create terabyte files and feed it to the robot just to see if they can crash
the server. Using the current "$ua->max_size($max_content);" slows
everything way down and comes with the extra baggage of a 206 response
header and a multipart/mixed content-type. The callback solves all these
issues, but will not work with "LWP::Parallel".
Thanks for listening,
John