Encoding Decoding problems in TreeBuilder

Encoding Decoding problems in TreeBuilder

am 13.06.2006 21:48:05 von gil.vidals

I'm running web pages through HTML::TreeBuilder and certain chracters,
namely < , >, ', and & are being encoded. For example, I ran the
following text through TreeBuilder:

The dog's collar 4 > 2 and 2 < 4 & amper

And the output is:

The dog's collar 4 > and 2 < 4 & amper

My question then, is this the expected and acceptable behavior for
TreeBuilder? According to W3.org,
http://www.w3.org/TR/html4/charset.html, a user agent (browser) will
translate character sets, but TreeBuilder isn't purporting to be a
user agent as LWP would be or is it????

I'm confused and need clarificaiton.

I am using TreeBuilder 3.13 and I have HTML::Tree 3.20 installed on
Perl 5.8.7.


--
Gil.Vidals@PositionResearch.com
Position Research, Inc.
Search engine results by research
tel: (760) 480-8291 fax: (760) 480-8271
www.PositionResearch.com