HTML::Parser
am 21.03.2006 01:07:31 von gil.vidals
------=_NextPart_000_0376_01C64C38.64E29BD0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
I have been using HTML::Parser for a few years now and would like to resolve
the issue of processing malformed HTML -- that is missing start and end
tags. In particular, I'm running many web pages which are missing the
closing and tags.
Is there is an easy way for HTML::Parser to insert implied tags such as what
is now done by HTML::TreeBuilder and HTML::Element????
Gil.Vidals@PositionResearch.com
Position Research, Inc.
Search engine results by research
tel: (760) 480-8291 fax: (760) 480-8271
www.PositionResearch.com
------=_NextPart_000_0376_01C64C38.64E29BD0--
Re: HTML::Parser
am 21.03.2006 12:28:23 von gisle
"Gil Vidals" writes:
> I have been using HTML::Parser for a few years now and would like to resolve
> the issue of processing malformed HTML -- that is missing start and end
> tags. In particular, I'm running many web pages which are missing the
> closing and tags.
>
> Is there is an easy way for HTML::Parser to insert implied tags such as what
> is now done by HTML::TreeBuilder and HTML::Element????
No, but if you can come up with simple rules for when the missing tags
should be inserted then writing a wrapper should be easy enough :)
Missing can be a challenge because HTML::Parser will just
report the rest of the document as text. You would have to find a
suitable place to restart parsing within this text and then perhaps
start off a new HTML::Parser instance there.
I would just use HTML::TreeBuilder :)
Regards,
Gisle