Re: Retrieving text without dtext option
am 15.02.2007 09:39:54 von gisle
kevin writes:
> I have a program which uses HTML::TokeParser to split apart web
> pages. Is there a way of making $stream->get_text() return the text
> without the entities decoded?
No. Why do you want that?
It is trivial to reimplement a version of get_text that does what you
want based get_token(). You even have the old get_text that you can
use as a starting point and just insert your version into the
HTML::TokeParser namespace.
sub HTML::TokeParser::get_undecoded_text {
...
}
> I can see options in get_token but is there an equivalent method
> with get_text or get_trimmed_text?
get_token always returns the raw undecoded text. There isn't an
option to make it do otherwise.
--Gisle