HTML::Parser & <plaintext> tag
am 02.11.2004 19:57:49 von kappa
Good day to all!
As far as I can understand HTML::Parser simply ignores closing
tag. I read the tests and Changes so I see that this is
intended behaviour and is special-cased of all CDATA
elements.
Does someone know the reasoning of this decision? :) It is just plain
interesting. Does HTML::Parser imitate some old browser here? It
results in weird effects for me as I write a HTML sanitizer for
WebMail.
--
Alex Kapranoff,
#!/usr/bin/perl -w
$SIG{__WARN__}=sub{print substr("@_",-43+ord$_,1)for
'6.823O1US90:350:739OJ;0:*'=~m}.}g},$}='PJlshrk';reset$}+43;
Re: HTML::Parser & <plaintext> tag
am 10.11.2004 19:25:47 von gisle
Alex Kapranoff writes:
> As far as I can understand HTML::Parser simply ignores closing
> tag. I read the tests and Changes so I see that this is
> intended behaviour and is special-cased of all CDATA
> elements.
>
> Does someone know the reasoning of this decision? :) It is just plain
> interesting.
A long time ago the HTTP protocol did not have MIME-like headers. The
client sent a "GET foo" line and the server responded with HTML and
then closed the connection. Since there was no way for the server to
indicate any other Content-Type than text/html the tag was
introduced so that text files could be served by just prefixing the
file content with this tag.
This was before the tag was invented so luckily we don't have a
similar unclosed tag :)
> Does HTML::Parser imitate some old browser here?
Yes, it was there in the beginning but still seems well supported. Of
my current browsers both Konqueror and MSIE support this. Firefox
support it in the same way as , i.e. it allow you to escape out
of it with .
The tag is described in this historic document:
http://www.w3.org/History/19921103-hypertext/hypertext/WWW/M arkUp/Tags.html#7
> It results in weird effects for me as I write a HTML sanitizer for
> WebMail.
Howcome? Do you have a need to suppress this behaviour in HTML::Parser?
Regards,
Gisle
Re: HTML::Parser & <plaintext> tag
am 11.11.2004 09:11:14 von kappa
* Gisle Aas [November 10 2004, 21:25]:
> then closed the connection. Since there was no way for the server to
> indicate any other Content-Type than text/html the tag was
> introduced so that text files could be served by just prefixing the
> file content with this tag.
>
> This was before the tag was invented so luckily we don't have a
> similar unclosed tag :)
Thank you very much for this enlightment! It explains everything!
BTW, by that time I had even seen computers once or twice from far
away :)
> my current browsers both Konqueror and MSIE support this. Firefox
> support it in the same way as , i.e. it allow you to escape out
> of it with .
This Firefox behaviour is likely to have confused me. Look, what if
I've got such a html: `'?
HTML::Parser stops parsing after `' so that no interesting
event is triggered on `