I want to learn something about HTML parser.

I want to learn something about HTML parser.

am 08.12.2005 09:12:59 von www.ezlife.com.cn

I am a new of perl. Now I need to use perl to parse html to xml and to
get "tree like" object tree in order to scan nodes of it one bye one.
This project is a method of "html scan" in our team.

Pls give me a module or a plan to teach me how to make this objects
tree.
Thank you!

Lihuaxia

Re: I want to learn something about HTML parser.

am 08.12.2005 14:39:04 von John Bokma

www.ezlife.com.cn@gmail.com wrote:

> I am a new of perl. Now I need to use perl to parse html to xml and to
> get "tree like" object tree in order to scan nodes of it one bye one.
> This project is a method of "html scan" in our team.
>
> Pls give me a module or a plan to teach me how to make this objects
> tree.
> Thank you!

Did you check CPAN?
Did you check the modules that come with your Perl installation?

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
I ploink googlegroups.com :-)

Re: I want to learn something about HTML parser.

am 10.12.2005 00:33:26 von metaperl

The HTML::Element module:
http://search.cpan.org/~petdance/HTML-Tree-3.1901/lib/HTML/E lement.pm

in the HTML::TreeBuilder distribution will help with this.

Re: I want to learn something about HTML parser.

am 12.12.2005 03:08:38 von Eric Bohlman

www.ezlife.com.cn@gmail.com wrote in news:1134029578.974396.83280
@g47g2000cwa.googlegroups.com:

> I am a new of perl. Now I need to use perl to parse html to xml and to
> get "tree like" object tree in order to scan nodes of it one bye one.
> This project is a method of "html scan" in our team.
>
> Pls give me a module or a plan to teach me how to make this objects
> tree.

Sounds like you want XML::LibXML. It can parse HTML (as well as arbitrary
XML) into a tree structure that follows the DOM standard, lets you
manipulate the tree, and can write the modified tree out as XML.