CJK Unified unicode translator

CJK Unified unicode translator

am 23.03.2006 00:17:35 von avidfan

Does any one know of a translator, preferably one implemented in perl,
that will translate CJK Unified code points to their respective language
code points. If I understand the concept of the CJK Unified code
points, these code points render as glyphs that are basically the same
in Chinese, Japanese or Korean.

My problem is that the target application I'm working with can't render
the CJK Unified code point because it is expecting, and can only handle,
JIS code points, but some of the data being fed to it is in CJK Unified
code points.

My search through CPAN didn't show anything obvious, at least to me.
Any pointers or suggestions would be appreciated.

Dennis
d underscore roesler at agilent dot com

Re: CJK Unified unicode translator

am 23.03.2006 05:33:25 von unknown

Dennis Roesler wrote:
> Does any one know of a translator, preferably one implemented in perl,
> that will translate CJK Unified code points to their respective language
> code points. If I understand the concept of the CJK Unified code
> points, these code points render as glyphs that are basically the same
> in Chinese, Japanese or Korean.
>
> My problem is that the target application I'm working with can't render
> the CJK Unified code point because it is expecting, and can only handle,
> JIS code points, but some of the data being fed to it is in CJK Unified
> code points.
>
> My search through CPAN didn't show anything obvious, at least to me. Any
> pointers or suggestions would be appreciated.
>
> Dennis
> d underscore roesler at agilent dot com

Have you looked at the Encode module? It might be as simple as opening
an input file specifying the CJK encoding, an output file specifying
JIS, and reading and writing. See "Encoding via PerlIO for this
particular slant on things.

Tom Wyant

Re: CJK Unified unicode translator

am 23.03.2006 13:51:32 von avidfan

harryfmudd [AT] comcast [DOT] net wrote:
> Dennis Roesler wrote:
>>
>> My search through CPAN didn't show anything obvious, at least to me.
>> Any pointers or suggestions would be appreciated.

I found Unicode::Unihan after more research introduced the Unihan term :-(.

>
> Have you looked at the Encode module? It might be as simple as opening
> an input file specifying the CJK encoding, an output file specifying
> JIS, and reading and writing. See "Encoding via PerlIO for this
> particular slant on things.

I've looked at this, but there doesn't seem to be an encoding that is
CJK Unified specific.

I've tried the following using this example from the Encode docs but
when I write the data out it complains about the CJK stuff that isn't
shiftjis. I toss the xml encoding line and rewrite it with encoding as
shiftjis, but besides the above errors XML::Simple complains that it
can't find Shift_JIS encoding and won't parse the file.

use Encode;

open my $in, "<:encoding(utf8)", $infile or die "In $infile: $!";
open my $out, ">:encoding(shiftjis)", $outfile or die "Out $outfile: $!";
my $fline = <$in>;

print $out qq~~;

while(<$in>){ print $out $_; }

I could change the workflow and have XML::Simple handle the UTF-8 file
and then use Encode's from_to function, or Unicode::Unihan, to do the
conversion.

Dennis
d underscore roesler at agilent dot com