East european characters from LaTex to UTF8
East european characters from LaTex to UTF8
am 30.11.2007 17:11:12 von RAPPAZ Francois
Hi
With the module TeX::Encode and Encode, I convert characters from
LaTex to UTF8. It works great except for characters use in Slovacia,
for example c or z with caron: =E8 =BE
TeX::Encode use the followings modules
use Encode::Encoding;
use Pod::LaTeX;
use HTML::Entities
and from the comments in TeX::Encode "It uses the the mapping from
Pod::LaTeX, but we use HTML::Entities
to get the Unicode character".
Is there another module I should install to convert these east
european characters ?
Thanks for any advice !
Francois
Re: East european characters from LaTex to UTF8
am 30.11.2007 20:33:01 von Joost Diepenmaat
On Fri, 30 Nov 2007 08:11:12 -0800, Francois wrote:
> Hi
> With the module TeX::Encode and Encode, I convert characters from LaTex
> to UTF8. It works great except for characters use in Slovacia, for
> example c or z with caron: Ä Å¾
Which encoding are your original latex files? Plain 7bit ASCII or
ISO-8859-1 with latex markup for the special characters or something else?
If something else, it may help to open/read the latex files using the
right "lower level" encoding layer, for example, if you're using cp1250
for the latex files:
open my $fh,"<:encoding(cp1250)","/my/latex/file.tex" or die $!;
print decode('latex',<$fh>);
See also the manpages for perlio and Encode
Joost.
Re: East european characters from LaTex to UTF8
am 30.11.2007 20:34:24 von Joost Diepenmaat
On Fri, 30 Nov 2007 19:33:01 +0000, Joost Diepenmaat wrote:
> print decode('latex',<$fh>);
Oops. That should probably be
print decode('latex',join('',<$fh>))
or something similar - decode accepts only a single input string.
Joost.
Re: East european characters from LaTex to UTF8
am 04.12.2007 08:28:33 von RAPPAZ Francois
On Nov 30, 8:33 pm, Joost Diepenmaat wrote:
> On Fri, 30 Nov 2007 08:11:12 -0800, Francois wrote:
>
> Which encoding are your original latex files? Plain 7bit ASCII or
> ISO-8859-1 with latex markup for the special characters or something else?
>
The file is ascii: it's from google scholar with the Import BibTex
option on:
@article{fedor2007dea,
title={{Dissociative electron attachment to HBr: A temperature
effect}},
author={Fedor, J. and Cingel, M. and Skaln{\`y}, JD and Scheier, P.
and M{\"a}rk, TD and {\v{C}}{\'\i}{\v{z}}ek, M. and Koloren{\v{c}}, P.
and Hor{\'a}{\v{c}}ek, J.},
journal={Physical Review A},
volume={75},
number={2},
pages={22703},
year={2007},
publisher={APS}
}
Re: East european characters from LaTex to UTF8
am 04.12.2007 12:55:30 von Joost Diepenmaat
On Mon, 03 Dec 2007 23:28:33 -0800, Francois wrote:
> The file is ascii: it's from google scholar with the Import BibTex
> option on
Hmm... Looks like Pod::LaTeX only handles iso 8858-1 characters.
You will probably have to add the extra characters you're using to
TeX::Encode yourself, or find some other way of converting latex to txt.
Joost.