HTML::Parser
am 25.04.2006 20:45:47 von oliver.block
Hello,
I don't know, if this is intended behavior of HTML::Parser or maybe if you
have that already on your ToDo list, but HTML::Parser does not call the
end_document handler or in other words, there does no end_document event
occur, unless you call $p->eof(). Neither after $p->parse($string), nor after
$p->parse_file($filename).
Best Regards,
Oliver Block
Re: HTML::Parser
am 25.04.2006 21:53:10 von Andy
--Apple-Mail-1-183865781
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed
On Apr 25, 2006, at 1:45 PM, Oliver Block wrote:
> I don't know, if this is intended behavior of HTML::Parser or maybe
> if you
> have that already on your ToDo list, but HTML::Parser does not call
> the
> end_document handler or in other words, there does no end_document
> event
> occur, unless you call $p->eof(). Neither after $p->parse($string),
> nor after
> $p->parse_file($filename).
Yes, that's correct, and it's well noted in the docs.
--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
--Apple-Mail-1-183865781--
Re: HTML::Parser
am 25.04.2006 22:09:22 von gisle
Oliver Block writes:
> I don't know, if this is intended behavior of HTML::Parser or maybe if you
> have that already on your ToDo list, but HTML::Parser does not call the
> end_document handler or in other words, there does no end_document event
> occur, unless you call $p->eof(). Neither after $p->parse($string), nor after
> $p->parse_file($filename).
This is not supposed to be for parse_file(). It would be helpful if
you try to produce a small test program that demonstrates this.
For parse() this is the expected behaviour.
--Gisle
Re: HTML::Parser
am 25.04.2006 23:57:29 von oliver.block
Am Dienstag, 25. April 2006 22:09 schrieben Sie:
> This is not supposed to be for parse_file(). It would be helpful if
> you try to produce a small test program that demonstrates this.
Yes, of course. I hope I didn't forget any important line of code;
Best regards,
Oliver Block
######## output ###########
start_document 23:43:45.188784
########################
with $p->eof();
######## output ###########
start_document 23:51:25.93499
end_document 23:51:25.96407
########################
####### calling script ########
...
my $p = MyParser->new;
$p->parse("localfile.html"); # local html file
#$p->eof();
...
########################
####### MyParser.pm ########
package MyParser;
use strict;
use HTML::Parser;
use Time::HiRes qw( gettimeofday );
my $class;
my $p;
...
sub new {
$class = shift;
my $self = { };
bless $self, $class;
$self->_init();
return $self;
}
sub _init() {
$p = HTML::Parser->new( api_version => 3,
start_h => [ \&_start_handler, 'tagname, attr' ],
end_h => [ \&_end_handler, 'tagname' ],
text_h => [ \&_text_handler, 'self' ],
start_document_h => [ \&_start_document_handler, 'self' ],
end_document_h => [\&_end_document_handler, 'self' ]);
}
sub parse() {
my ($class, $s, $uri) @_; # local $class
...
$p->parse($s);
}
sub _start_document_handler {
printTimestamp $class, "start_document";
}
sub _end_document_handler {
printTimestamp $class, "end_document";
}
sub eof() {
return $p->eof();
}
sub printTimestamp {
my ($class, $caller, $comment) = @_;
my($sec, $min, $hour) = localtime(time);
my ($hsec, $msec) = gettimeofday; # $hsec not used
printf("%s %s %2d:%2d:%2d.%d\n", $caller, $comment, $hour, $min, $sec,
$msec):
}
1;
__END__
Re: HTML::Parser
am 26.04.2006 06:52:54 von gisle
Oliver Block writes:
> Am Dienstag, 25. April 2006 22:09 schrieben Sie:
> > This is not supposed to be for parse_file(). It would be helpful if
> > you try to produce a small test program that demonstrates this.
>
> Yes, of course. I hope I didn't forget any important line of code;
You forgot to call parse_file().
--Gisle
> ######## output ###########
> start_document 23:43:45.188784
> ########################
>
> with $p->eof();
> ######## output ###########
> start_document 23:51:25.93499
> end_document 23:51:25.96407
> ########################
> ####### calling script ########
> ..
> my $p = MyParser->new;
> $p->parse("localfile.html"); # local html file
> #$p->eof();
> ..
> ########################
> ####### MyParser.pm ########
> package MyParser;
>
> use strict;
> use HTML::Parser;
> use Time::HiRes qw( gettimeofday );
>
> my $class;
> my $p;
> ..
> sub new {
> $class = shift;
> my $self = { };
> bless $self, $class;
> $self->_init();
> return $self;
> }
> sub _init() {
> $p = HTML::Parser->new( api_version => 3,
> start_h => [ \&_start_handler, 'tagname, attr' ],
> end_h => [ \&_end_handler, 'tagname' ],
> text_h => [ \&_text_handler, 'self' ],
> start_document_h => [ \&_start_document_handler, 'self' ],
> end_document_h => [\&_end_document_handler, 'self' ]);
> }
> sub parse() {
> my ($class, $s, $uri) @_; # local $class
> ...
> $p->parse($s);
> }
> sub _start_document_handler {
> printTimestamp $class, "start_document";
> }
> sub _end_document_handler {
> printTimestamp $class, "end_document";
> }
> sub eof() {
> return $p->eof();
> }
> sub printTimestamp {
> my ($class, $caller, $comment) = @_;
> my($sec, $min, $hour) = localtime(time);
> my ($hsec, $msec) = gettimeofday; # $hsec not used
> printf("%s %s %2d:%2d:%2d.%d\n", $caller, $comment, $hour, $min, $sec,
> $msec):
> }
> 1;
> __END__
Re: HTML::Parser
am 26.04.2006 10:40:10 von gisle
I can't find any issues with 'end_document' but I discovered that
'start_document' didn't fire for empty documents and could even fire
multiple times if parse() was called repeatedly with empty chunks.
I've now uploaded HTML-Parser-3.52 which make sure 'start_document'
fire exactly once per document.
--Gisle
Re: HTML::Parser
am 26.04.2006 11:52:05 von oliver.block
Am Mittwoch, 26. April 2006 06:52 schrieb Gisle Aas:
> You forgot to call parse_file().
That's true, but just in my email. I parsed the file once again with
$p->parse_file("localfile.html");
with and without
$p->eof();
I got the same results on my machine with my program.
Best Regards,
Oliver Block
Re: HTML::Parser
am 26.04.2006 12:58:50 von gisle
Oliver Block writes:
> Am Mittwoch, 26. April 2006 06:52 schrieb Gisle Aas:
> > You forgot to call parse_file().
>
> That's true, but just in my email. I parsed the file once again with
>
> $p->parse_file("localfile.html");
>
> with and without
>
> $p->eof();
>
> I got the same results on my machine with my program.
The program you provided was not runnable, so I can't reproduce the
error you see with it. Please produce a small _runnable_ program
(single file preferably) that demonstrates the error. Remove all code
that is not relevant to trigger the error. If the error depends on
the content of "localfile.html" then trim it down to the minimum that
still fails and pass it on as well.
--Gisle