RE: RobotRules fails on user-agents with spaces

RE: RobotRules fails on user-agents with spaces

am 14.10.2005 17:32:35 von Matthew.van.Eerde

Gisle Aas wrote:
> writes:
>=20
>> The problem... if I include a space in my robot's user agent, it
>> will fail to recognize robots.txt records targeted to my robot.
>=20
> You are not allowed to have space in the user agent name. See section
> "3.8 Product Tokens" of RFC 2616 [1]. Isn't it an option to just
> rename your spider to something that follows the spec?

Oops! Yes, of course. I will rename my spider accordingly.
Patch proposal withdrawn.

> I'm not really opposed to this patch if product names with spaces are
> actually in common use. Do you have data to suggest it is?

Well, I do... here's some spiders that hit my site last week that are of =
this form:
Syndication Engine/1.1 (http://www.hexlet.com)
Feedster Crawler/1.0; Feedster, Inc.
Jakarta Commons-HttpClient/3.0-rc1
FAST Enterprise Crawler/6.4 (helpdesk at fast.no)
Jakarta HTTP Client/1.0
UPG1 UP/4.0 (compatible; Blazer 1.0)

On the other hand it's doubtful that any of these use RobotRules.pm, so =
these don't imply that a patch is called for.

--=20
Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer