Retrieving text without dtext option

Retrieving text without dtext option

am 14.02.2007 23:10:03 von kevinphilp

I have a program which uses HTML::TokeParser to split apart web pages.
Is there a way of making $stream->get_text() return the text without the
entities decoded? I can see options in get_token but is there an
equivalent method with get_text or get_trimmed_text?

Thanks

Kevin.

Re: Retrieving text without dtext option

am 15.02.2007 09:39:54 von gisle

kevin writes:

> I have a program which uses HTML::TokeParser to split apart web
> pages. Is there a way of making $stream->get_text() return the text
> without the entities decoded?

No. Why do you want that?

It is trivial to reimplement a version of get_text that does what you
want based get_token(). You even have the old get_text that you can
use as a starting point and just insert your version into the
HTML::TokeParser namespace.

sub HTML::TokeParser::get_undecoded_text {
...
}

> I can see options in get_token but is there an equivalent method
> with get_text or get_trimmed_text?

get_token always returns the raw undecoded text. There isn't an
option to make it do otherwise.

--Gisle