FAQ 9.8 How do I fetch an HTML file?
am 16.04.2008 03:03:01 von PerlFAQ ServerThis is an excerpt from the latest version perlfaq9.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
------------------------------------------------------------ --------
9.8: How do I fetch an HTML file?
One approach, if you have the lynx text-based HTML browser installed on
your system, is this:
$html_code = `lynx -source $url`;
$text_data = `lynx -dump $url`;
The libwww-perl (LWP) modules from CPAN provide a more powerful way to
do this. They don't require lynx, but like lynx, can still work through
proxies:
# simplest version
use LWP::Simple;
$content = get($URL);
# or print HTML from a URL
use LWP::Simple;
getprint "http://www.linpro.no/lwp/";
# or print ASCII from HTML from a URL
# also need HTML-Tree package from CPAN
use LWP::Simple;
use HTML::Parser;
use HTML::FormatText;
my ($html, $ascii);
$html = get("http://www.perl.com/");
defined $html
or die "Can't fetch HTML from http://www.perl.com/";
$ascii = HTML::FormatText->new->format(parse_html($html));
print $ascii;
------------------------------------------------------------ --------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.