completely rewind an HTML::TokeParser (PullParser)

completely rewind an HTML::TokeParser (PullParser)

am 21.12.2005 20:23:51 von Ashley

I want to parse out some HTML to count the text characters and then
rewind to the beginning of the parser to step through tags and excerpt
to an argument supplied character or percentage point in the *text*
chars and then, keeping track of what tags are open, automatically add
close tags.

To be used in a TT2 filter like "truncate" but for HTML instead of
plain text. So that...

[% html = "

this is something to truncate

" %]
[% html | truncate_html(10) %]

Would output

this is so...


^^^^^^12345^^^67890

Anyway, the trouble I'm trying to address is rewinding the parser for
the second walkthrough once I've counted text characters.

I want to do this: 1 while $p->unget_token()

But it doesn't work because HTML::PullParser->unget_token returns the
parser object itself.

To make a long story short, and it's not too late for that -- would
changing the return be reasonable or would it break code?

sub unget_token
{
my $self = shift;
unshift @{$self->{pullparser_accum}}, @_;
# $self; <-- change, don't return $self anymore
}

If not, does anyone have a smart idea for how to rewind it while
respecting the interface (ie, not testing $self->{pullparser_accum})?

Thanks!
-Ashley

Re: completely rewind an HTML::TokeParser (PullParser)

am 21.12.2005 20:34:49 von apv

D'oh, sorry, just realized that proposed sub change is useless. So,
please ignore that part. Still looking for ideas to make the rewind
work though.

On Wednesday, December 21, 2005, at 11:23 AM, Ashley Pond V wrote:

> I want to parse out some HTML to count the text characters and then
> rewind to the beginning of the parser to step through tags and excerpt
> to an argument supplied character or percentage point in the *text*
> chars and then, keeping track of what tags are open, automatically add
> close tags.
>
> To be used in a TT2 filter like "truncate" but for HTML instead of
> plain text. So that...
>
> [% html = "

this is something to truncate

" %]
> [% html | truncate_html(10) %]
>
> Would output
>

this is so...


> ^^^^^^12345^^^67890
>
> Anyway, the trouble I'm trying to address is rewinding the parser for
> the second walkthrough once I've counted text characters.
>
> I want to do this: 1 while $p->unget_token()
>
> But it doesn't work because HTML::PullParser->unget_token returns the
> parser object itself.
>
> To make a long story short, and it's not too late for that -- would
> changing the return be reasonable or would it break code?
>
> sub unget_token
> {
> my $self = shift;
> unshift @{$self->{pullparser_accum}}, @_;
> # $self; <-- change, don't return $self anymore
> }
>
> If not, does anyone have a smart idea for how to rewind it while
> respecting the interface (ie, not testing $self->{pullparser_accum})?
>
> Thanks!
> -Ashley
>

Re: completely rewind an HTML::TokeParser (PullParser)

am 21.12.2005 22:24:41 von gisle

apv writes:

> D'oh, sorry, just realized that proposed sub change is useless. So,
> please ignore that part. Still looking for ideas to make the rewind
> work though.

Why not just create a new parser object for the same string?

--Gisle

Re: completely rewind an HTML::TokeParser (PullParser)

am 21.12.2005 22:30:05 von apv

Yeah, I think this is probably the right solution after playing some
more
myself. I think I was suffering from premature optimization, trying to
avoid creating another object.

Thanks!
-Ashley

On Wednesday, December 21, 2005, at 01:24 PM, Gisle Aas wrote:

> apv writes:
>
>> D'oh, sorry, just realized that proposed sub change is useless. So,
>> please ignore that part. Still looking for ideas to make the rewind
>> work though.
>
> Why not just create a new parser object for the same string?
>
> --Gisle
>
>