Proxy questions
am 21.01.2005 14:05:01 von hans--=-1QkAvNwuQjGLW+imjZiN
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
Hello,
One of the users of my Checkbot tool
some trouble with the proxy functionality in LWP. In Checkbot I use the
proxy and noproxy functionality of LWP without changes, so I guess that
these issues should be addressed in LWP. The first issue is a problem
with the noproxy feature in relation to domain-less hostnames. The
report mentions FQDN's, but I think this should be read as canonical
URI's. The second issue is a feature request.
Kind regards,
Hans
- checkbot will ask the proxy if the URL contains a non-FQDN hostname
and --noproxy contains the local domain. E.g. one intranet
server is foo.de.marconicomms.com, and I run
checkbot --proxy bar --noproxy de.marconicomms.com
which unwantedly asks the proxy bar for http://foo/index.html
A direct connection is used for
http://foo.de.marconicomms.com/index.html
as expected.
This is probably because the noproxy args are matched against the
hostname as found in the URL, and not against the FQDN. Thus,
"foo" does not match "de.marconicomms.com" and the proxy is used.
This could be fixed if the matching would follow the same mechanism
as the resolver, e.g. looking at the "search" line in /etc/resolv.conf
for possible domains. Alternatively, a non-FQDN could be canonicalized
by a name service lookup before being matched agains the noproxy list.
What do you think?
- The common web browsers (IE; Mozilla et al) configure their
proxy/noproxy via a proxy.pac file. This file is normally centrally
maintained. It is referenced in RFC 3040, quote:
6.2 Proxy Auto Configuration (PAC)
Best known reference:
"Navigator Proxy Auto-Config File Format" [12]
Description:
A JavaScript script retrieved from a web server is
executed for
each URL accessed to determine the appropriate proxy
(if any) to
be used to access the resource. User agents must be
configured to
request this script upon startup. There is no
bootstrap
mechanism, manual configuration is necessary.
Despite manual configuration, the process of proxy
configuration
is simplified by centralizing it within a script at a
single
location.
Security:
Common policy per organization possible but still
requires initial
manual configuration. PAC is better than "manual
proxy
configuration" since PAC administrators may update the
proxy
configuration without further user intervention.
Interoperability of PAC files is not high, since
different
browsers have slightly different interpretations of
the same
script, possibly leading to undesired effects.
Deployment:
Implemented in Netscape Navigator and Microsoft
Internet Explorer.
Submitter:
Document editors.
[12] refers to
http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-l ive.html
Now, in an ideal world, checkbot would just use the same mechanism for
proxy configuration that the web servers use. In fact, our proxy.pac
is almost hundred lines in length and turning that into --[no]proxy
args is repetitive and error prone.
I realize that parsing a proxy.pac (=3D extremely restricted JavaScript)
may not be a one-liner in perl. Anyway, maybe you consider the idea of
using a standardized and flexible [no]proxy determination a Good
Thing(TM).
And if you don't get right to it, adding it to the shadow todo list
is a good idea. :-)
--=-1QkAvNwuQjGLW+imjZiN
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
iD8DBQBB8P39TF3ZWfsIeLsRAnnOAJ0e6Ip86jaM/8OGMZTKKYiNcCzLBACe MMdQ
RwBo1EtZmZjI290X6O0oO2I=
=Zcre
-----END PGP SIGNATURE-----
--=-1QkAvNwuQjGLW+imjZiN--