[PATCH] Caching/reusing WWW::RobotRules(::InCore)
am 12.10.2004 08:54:11 von ville.skytta--=-WBe+CFWa6yDKakuhfgbY
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
The current behaviour of LWP::RobotUA, when passed in an existing
WWW::RobotRules::InCore object is counterintuitive to me.
I am of this opinion because of the documentation of $rules in
LWP::RobotUA->new() and WWW::RobotRules->agent(), as well as the
implementation in WWW::RobotRules::AnyDBM_File.
Currently, W::R::InCore empties the cache always when agent() is called,
regardless if the agent name changed or not. W::R::AnyDBM_File does not
seem to have this problem.
I suggest applying the attached patch to fix this.
Additionally, I see InCore and AnyDBM_File use a different algorithm for
getting the "short" agent name from the full one, with the AnyDBM_File
looking "older". Perhaps add a new method/function for this (eg.
short_agent()) in WWW::RobotRules that could be used in both InCore and
AnyDBM_File?
While on the robots subject, applying something like the "warning could
be more helpful" change from
http://www.xray.mpe.mpg.de/mailing-lists/libwww-perl/2004-08 /msg00024.html would be most welcome.
--=-WBe+CFWa6yDKakuhfgbY
Content-Disposition: inline; filename=robotrules-agent.patch
Content-Type: text/x-patch; name=robotrules-agent.patch; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
Index: lib/WWW/RobotRules.pm
============================================================ =======
RCS file: /cvsroot/libwww-perl/lwp5/lib/WWW/RobotRules.pm,v
retrieving revision 1.30
diff -a -u -r1.30 RobotRules.pm
--- lib/WWW/RobotRules.pm 9 Apr 2004 15:09:14 -0000 1.30
+++ lib/WWW/RobotRules.pm 12 Oct 2004 06:39:34 -0000
@@ -185,10 +185,12 @@
# "FooBot/1.2" => "FooBot"
# "FooBot/1.2 [http://foobot.int; foo@bot.int]" => "FooBot"
- delete $self->{'loc'}; # all old info is now stale
$name = $1 if $name =~ m/(\S+)/; # get first word
$name =~ s!/.*!!; # get rid of version
- $self->{'ua'}=$name;
+ unless ($old && $old eq $name) {
+ delete $self->{'loc'}; # all old info is now stale
+ $self->{'ua'} = $name;
+ }
}
$old;
}
--=-WBe+CFWa6yDKakuhfgbY--