HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?

HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?

am 31.01.2007 12:00:00 von Gerwin

Hi,

I'm using HTML::Parser to strip HTML tags from my files. I noticed
how // and the javascript between that is not
stripped. Any idea how to do this?

-Gerwin

Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?

am 31.01.2007 20:15:15 von Andy

The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:


notice how this is similar to
the first two and last characters of