ANN: WWW::Agent 0.03 has entered CPAN
am 19.03.2005 22:11:11 von rhoHi all,
I have put WWW::Agent onto CPAN.
http://search.cpan.org/~drrho/WWW-Agent/
We will use it here to base on it functionality given in
WWW::Mechanize, WWW::Robot, LWP::RobotUA, and another gazillion of
similar packages. From the README:
What is it?
-----------
This suite of packages [ HIGHLY EXPERIMENTAL ] provide basic
functionality of an 'abstract browser'. The idea is that that
abstract browser is only capable to load objects (pages) via
HTTP, FTP, ..., but itself has no other functionality.
It is the tasks of particular plugins to add more specific
functionality, such as 'Link Checking' or 'Spidering' or 'Having
headers like Firefox'.
To make that happen, the abstract browser exposes the phases of a
request to allow plugins (aka modules) to intercept when they feel the
need. [ If you understand Apache's module concept then you immediately
get the idea. ]
To make things interesting, and to allow the agent to be run in
reactive environments, it is written based on POE (Perl Object
Environment, or similar). The good side of this is, that your
application is not necessarily blocked when fetching documents off the
network. The downside is that programming is a bit more, well,
interesting.
The only interesting plugin at this stage is the 'Director' which
interpretes a textual language (called WeeZL) to visit websites via a
script. Everything is quite crude still, but something like the
following might even work: It logs in into our teaching portal, and
scrapes student results from the HTML.
login: { # block to define how to log in
url m|https?://james.bond.edu.au/.*| or die "there is nothing to log in here"