Encoding Decoding problems in TreeBuilder
am 13.06.2006 21:48:05 von gil.vidalsI'm running web pages through HTML::TreeBuilder and certain chracters,
namely < , >, ', and & are being encoded. For example, I ran the
following text through TreeBuilder:
The dog's collar 4 > 2 and 2 < 4 & amper
And the output is:
The dog's collar 4 > and 2 < 4 & amper
My question then, is this the expected and acceptable behavior for
TreeBuilder? According to W3.org,
http://www.w3.org/TR/html4/charset.html, a user agent (browser) will
translate character sets, but TreeBuilder isn't purporting to be a
user agent as LWP would be or is it????
I'm confused and need clarificaiton.
I am using TreeBuilder 3.13 and I have HTML::Tree 3.20 installed on
Perl 5.8.7.
--
Gil.Vidals@PositionResearch.com
Position Research, Inc.
Search engine results by research
tel: (760) 480-8291 fax: (760) 480-8271
www.PositionResearch.com