Re: Possible bug in HTML::Parser version 3.48

Re: Possible bug in HTML::Parser version 3.48

am 08.02.2006 12:14:11 von gisle

Jack Goldstein writes:

> I've installed HTML::Parser on an AIX 5.1 system running perl 5.8.6 along
> with HTML::Tagset and all tests passed except for one relating to POD that
> was skipped. However, one of our developers found that it didn't properly
> parse titles. Here's a sample program that demonstrates the problem. When
> run with the perl 5.8.6 that I installed, the output is
>
> Help Title is
> (blank line)
>
> but when when run with a copy of perl5.8.0 that someone else installed, we
> get:
>
> Help Title is Installation Help
>
> which I assume is correct.

Thanks for your bug report. This is indeed a bug. Its cause is that
some events would trigger under certain circumstances even after a
handler has told the parser to stop with $p->eof. I've now fixed this
issue and uploaded HTML-Parser-3.49 to CPAN.

My guess would be that your perl5.8.0 installation has a version of
HTML-Parser that is older than version 3.40, where we made <br /> tags also parse in literal mode. This could explain why this issue<br /> didn't occur with that perl installation.<br /> <br /> > use HTML::Parser;<br /> > <br /> > my $title='';<br /> > <br /> > my $p = HTML::Parser->new(api_version => 3,);<br /> > $p->handler(start=> \&title_handler, 'tagname, self');<br /> > $p->parse_file("db2wi.htm");<br /> > print "\nHelp Title is $title\n";<br /> > exit 0;<br /> > <br /> > ########################################<br /> > # Subroutines<br /> > ########################################<br /> > sub title_handler {<br /> > return if shift ne 'title';<br /> > my $self = shift; <br /> > $self->handler(text => sub { $title= shift}, 'dtext');<br /> <br /> BTW, HTML-Parser does not guarantee that all text between the<br /> <title>... tags are reported in a single callback, which means
this code should append to $title instead of just assigning to it.
That would make it:

$self->handler(text => sub { $title .= shift}, 'dtext');

Alternatively, set the 'unbroken_text' attribute to a TRUE value.

> $self->handler(end => sub { shift->eof if shift eq 'title' }, 'tagname,
> self');
> }

Regards,
Gisle