user agents
am 01.12.2004 10:25:51 von zed.lopezI just found some behavior that surprised me.
LWP::Simple's perldoc says:
The user agent created by this module will identify itself as
"LWP::Simple/#.##" (where "#.##" is the libwww-perl version number)
and will initialize its proxy defaults from the environment (by
calling $ua->env_proxy).
But it doesn't mention that the get method ends up not using it,
allowing _trivial_http_get to write its own User Agent string.
get results in this in a weblog:
69.109.167.40 - - [01/Dec/2004:00:41:54 -0800] "GET / HTTP/1.0" 200
44222 "-" "lwp-trivial/1.40"
getprint results in this:
69.109.167.40 - - [01/Dec/2004:00:41:59 -0800] "GET / HTTP/1.1" 200
44222 "-" "LWP::Simple/5.79"
I just went through some hair-pulling debugging 'cause getprint was
working where get was failing, apparently because the site's robots.txt was
allowing one and blocking the other. It's also
striking that they use different HTTP versions.
I'd like to suggest these differences be documented. Does anyone know
why _trivial_http_get uses its own user agent and HTTP version?
Zed