XML::Twig produces double encoded UTF-8
XML::Twig produces double encoded UTF-8
am 04.01.2007 21:36:30 von Josef Feit
Hi,
I have problem with XML::Twig (Fedora 6).
When parsing and UTF-8 encoded xml file, I am getting double
encoded file in the output.
The header is
and the XML::Twig is build by
my $t= XML::Twig->new(
keep_encoding => 1,
twig_handlers =>
{ PICTURE => \&picture,
},
);
$t->parsefile($xmlobr);
$t->purge;
------------------
The picture sub uses print to output some parts of the
"picture" (which is xml structure, no binaray) to STDOUT.
No matter of the keep_encoding option (even if left out) the
problem persists.
Is there any other way to tell the parser to keep the
encoding untouched?
Thanks,
Josef
Re: XML::Twig produces double encoded UTF-8
am 10.01.2007 21:07:01 von mirod
On Jan 4, 3:36 pm, Josef Feit wrote:
> I have problem with XML::Twig (Fedora 6).
>
> When parsing and UTF-8 encoded xml file, I am getting double
> encoded file in the output.
Hi,
That's unusual. Encoding problems are always a pain. The problem could
come from
the LANG environment variable, what is it set to? Did you try opening
the output file as
utf8 (see perldoc perlunicode). What's your version of perl BTW?
--
mirod
Re: XML::Twig produces double encoded UTF-8
am 13.01.2007 18:51:36 von Josef Feit
mirod wrote:
>
> On Jan 4, 3:36 pm, Josef Feit wrote:
>> I have problem with XML::Twig (Fedora 6).
>>
>> When parsing and UTF-8 encoded xml file, I am getting double
>> encoded file in the output.
>
> Hi,
>
> That's unusual. Encoding problems are always a pain. The problem could
> come from
> the LANG environment variable, what is it set to? Did you try opening
> the output file as
> utf8 (see perldoc perlunicode). What's your version of perl BTW?
>
It is v5.8.8
echo $LANG: cs_CZ.UTF-8
echo $PERL_UNICODE: 1
I still do not know why the perl doubleencoded the strings.
Maybe I have some leftover iso88592 characters in the input.
The
binmode(STDOUT, ":utf8");
seems to work, however.
Sometimes I have observed similar problem (which was not
XML::Twig specific) when some perl script was used as a
filter from vi (gvim). Working with the file from the shell
was OK, from vi doubleencoded characters were returned.
Again, the binmode seems to remove the problem.
I will try the perldoc perlunicode.
Thank you
Josef
Re: XML::Twig produces double encoded UTF-8
am 15.01.2007 02:11:58 von Big and Blue
Josef Feit wrote:
> Hi,
>
> I have problem with XML::Twig (Fedora 6).
Which version of XML::Twig?
3.28 was released on 05 Jan 2007. I seem to recall seeing some encoding
issues a few months ago which were fixed by a patch supplied in the bug
database on CPAN, but can't see it there now so it may have been fixed.
--
Just because I've written it doesn't mean that
either you or I have to believe it.
Re: XML::Twig produces double encoded UTF-8
am 15.01.2007 19:18:29 von Josef Feit
Big and Blue wrote:
> Josef Feit wrote:
>> Hi,
>>
>> I have problem with XML::Twig (Fedora 6).
>
> Which version of XML::Twig?
>
> 3.28 was released on 05 Jan 2007. I seem to recall seeing some
-it was 3.23
-upgraded to 3.28 (doc says set_keep_encoding bug fixed)
-the output seems to be OK now
Thank you
Josef