East european characters from LaTex to UTF8

East european characters from LaTex to UTF8

am 30.11.2007 17:11:12 von RAPPAZ Francois

Hi
With the module TeX::Encode and Encode, I convert characters from
LaTex to UTF8. It works great except for characters use in Slovacia,
for example c or z with caron: =E8 =BE

TeX::Encode use the followings modules
use Encode::Encoding;
use Pod::LaTeX;
use HTML::Entities

and from the comments in TeX::Encode "It uses the the mapping from
Pod::LaTeX, but we use HTML::Entities
to get the Unicode character".
Is there another module I should install to convert these east
european characters ?
Thanks for any advice !

Francois

Re: East european characters from LaTex to UTF8

am 30.11.2007 20:33:01 von Joost Diepenmaat

On Fri, 30 Nov 2007 08:11:12 -0800, Francois wrote:

> Hi
> With the module TeX::Encode and Encode, I convert characters from LaTex
> to UTF8. It works great except for characters use in Slovacia, for
> example c or z with caron: č ž

Which encoding are your original latex files? Plain 7bit ASCII or
ISO-8859-1 with latex markup for the special characters or something else?

If something else, it may help to open/read the latex files using the
right "lower level" encoding layer, for example, if you're using cp1250
for the latex files:

open my $fh,"<:encoding(cp1250)","/my/latex/file.tex" or die $!;

print decode('latex',<$fh>);

See also the manpages for perlio and Encode

Joost.

Re: East european characters from LaTex to UTF8

am 30.11.2007 20:34:24 von Joost Diepenmaat

On Fri, 30 Nov 2007 19:33:01 +0000, Joost Diepenmaat wrote:
> print decode('latex',<$fh>);

Oops. That should probably be

print decode('latex',join('',<$fh>))

or something similar - decode accepts only a single input string.

Joost.

Re: East european characters from LaTex to UTF8

am 04.12.2007 08:28:33 von RAPPAZ Francois

On Nov 30, 8:33 pm, Joost Diepenmaat wrote:
> On Fri, 30 Nov 2007 08:11:12 -0800, Francois wrote:

>
> Which encoding are your original latex files? Plain 7bit ASCII or
> ISO-8859-1 with latex markup for the special characters or something else?
>

The file is ascii: it's from google scholar with the Import BibTex
option on:

@article{fedor2007dea,
title={{Dissociative electron attachment to HBr: A temperature
effect}},
author={Fedor, J. and Cingel, M. and Skaln{\`y}, JD and Scheier, P.
and M{\"a}rk, TD and {\v{C}}{\'\i}{\v{z}}ek, M. and Koloren{\v{c}}, P.
and Hor{\'a}{\v{c}}ek, J.},
journal={Physical Review A},
volume={75},
number={2},
pages={22703},
year={2007},
publisher={APS}
}

Re: East european characters from LaTex to UTF8

am 04.12.2007 12:55:30 von Joost Diepenmaat

On Mon, 03 Dec 2007 23:28:33 -0800, Francois wrote:

> The file is ascii: it's from google scholar with the Import BibTex
> option on

Hmm... Looks like Pod::LaTeX only handles iso 8858-1 characters.
You will probably have to add the extra characters you're using to
TeX::Encode yourself, or find some other way of converting latex to txt.

Joost.