XML::Twig produces double encoded UTF-8

XML::Twig produces double encoded UTF-8

am 04.01.2007 21:36:30 von Josef Feit

Hi,

I have problem with XML::Twig (Fedora 6).

When parsing and UTF-8 encoded xml file, I am getting double
encoded file in the output.

The header is






and the XML::Twig is build by

my $t= XML::Twig->new(
keep_encoding => 1,
twig_handlers =>
{ PICTURE => \&picture,
},
);
$t->parsefile($xmlobr);
$t->purge;

------------------

The picture sub uses print to output some parts of the
"picture" (which is xml structure, no binaray) to STDOUT.

No matter of the keep_encoding option (even if left out) the
problem persists.

Is there any other way to tell the parser to keep the
encoding untouched?

Thanks,
Josef

Re: XML::Twig produces double encoded UTF-8

am 10.01.2007 21:07:01 von mirod

On Jan 4, 3:36 pm, Josef Feit wrote:
> I have problem with XML::Twig (Fedora 6).
>
> When parsing and UTF-8 encoded xml file, I am getting double
> encoded file in the output.

Hi,

That's unusual. Encoding problems are always a pain. The problem could
come from
the LANG environment variable, what is it set to? Did you try opening
the output file as
utf8 (see perldoc perlunicode). What's your version of perl BTW?

--
mirod

Re: XML::Twig produces double encoded UTF-8

am 13.01.2007 18:51:36 von Josef Feit

mirod wrote:
>
> On Jan 4, 3:36 pm, Josef Feit wrote:
>> I have problem with XML::Twig (Fedora 6).
>>
>> When parsing and UTF-8 encoded xml file, I am getting double
>> encoded file in the output.
>
> Hi,
>
> That's unusual. Encoding problems are always a pain. The problem could
> come from
> the LANG environment variable, what is it set to? Did you try opening
> the output file as
> utf8 (see perldoc perlunicode). What's your version of perl BTW?
>

It is v5.8.8
echo $LANG: cs_CZ.UTF-8
echo $PERL_UNICODE: 1

I still do not know why the perl doubleencoded the strings.
Maybe I have some leftover iso88592 characters in the input.

The
binmode(STDOUT, ":utf8");

seems to work, however.

Sometimes I have observed similar problem (which was not
XML::Twig specific) when some perl script was used as a
filter from vi (gvim). Working with the file from the shell
was OK, from vi doubleencoded characters were returned.
Again, the binmode seems to remove the problem.

I will try the perldoc perlunicode.

Thank you
Josef

Re: XML::Twig produces double encoded UTF-8

am 15.01.2007 02:11:58 von Big and Blue

Josef Feit wrote:
> Hi,
>
> I have problem with XML::Twig (Fedora 6).

Which version of XML::Twig?

3.28 was released on 05 Jan 2007. I seem to recall seeing some encoding
issues a few months ago which were fixed by a patch supplied in the bug
database on CPAN, but can't see it there now so it may have been fixed.


--
Just because I've written it doesn't mean that
either you or I have to believe it.

Re: XML::Twig produces double encoded UTF-8

am 15.01.2007 19:18:29 von Josef Feit

Big and Blue wrote:
> Josef Feit wrote:
>> Hi,
>>
>> I have problem with XML::Twig (Fedora 6).
>
> Which version of XML::Twig?
>
> 3.28 was released on 05 Jan 2007. I seem to recall seeing some

-it was 3.23
-upgraded to 3.28 (doc says set_keep_encoding bug fixed)
-the output seems to be OK now

Thank you
Josef