Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
am 31.01.2007 20:15:15 von Andy
The CDATA tag can be looked upon as being a comment in HTML.
According to the documentation at http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:
$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".
The official behaviour is enabled by enabling this attribute.
Enabling of 'strict_comment' also disables recognizing these forms as
comments:
comment>
notice how this is similar to
the first two and last characters of