Spreadsheet::Read special characters handling

Spreadsheet::Read special characters handling

am 20.11.2006 08:33:56 von anevare2

I'm using the Spreadsheet::Read module (which works quite well
generally). I have some spreadsheets with special characters like an
accented e (=E9). I'm having some trouble processing these characters.
I haven't dealt much with these type of characters in this context in
the past. The accented e's are coming out like "?".

My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
in A1, excel automatically made the accented e, in A2, i pressed Option
e e for the accented e, and A3, I undid the special e to make it a
regular e

A1:A3
---------
Caf=E9
Caf=E9
Cafe

This is my program
------------------------------
use Spreadsheet::Read;

my $ref =3D ReadData('special_char_test.xls');

my $cell1 =3D $ref->[1]{A1};
my $cell2 =3D $ref->[1]{cell}[1][2]; #try different way
my $cell3 =3D $ref->[1]{cell}[1][3];

print "Cell A1: $cell1\n";
print "Cell A2: $cell2\n";
print "Cell A3: $cell3\n";

Output (standard out):
-------------------
Cell A1: Caf?
Cell A2: Caf?
Cell A3: Cafe

What can I do so that the accented e prints correctly or so the correct
format can be saved to a csv file?
Thanks.
A

Re: Spreadsheet::Read special characters handling

am 20.11.2006 13:02:10 von h.m.brand

On Mon, 20 Nov 2006 08:33:56 +0100, wrote:

> I'm using the Spreadsheet::Read module (which works quite well
> generally). I have some spreadsheets with special characters like an
> accented e (é). I'm having some trouble processing these characters.
> I haven't dealt much with these type of characters in this context in
> the past. The accented e's are coming out like "?".

That is based on both encoding and the font you use on the terminal.
By default, Spreadsheet::Read does not change the encoding, which means
that if the fields are encoded in Unicode (utf8), you should take action
in your script to output Unicode.

Read http://search.cpan.org/~rgarcia/perl-5.9.4/pod/perlunitut.po d

Summary, if your terminal is capable of dealing with UTF8 (like a
recent X11R6 xterm with utf8 enabled and font *-iso10646-1), then
adding

binmode STDOUT, ":utf8";

will probably suffice. If your terminal is iso8859-*, which also
supports the e-acute, then you will have to take appropriate actions

I think that the csv file is OK already. Try opening it in whatever
unicode enabled editor (I think both M$Word and M$Excel will do here)
and see how it looks

> My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
> in A1, excel automatically made the accented e, in A2, i pressed Option
> e e for the accented e, and A3, I undid the special e to make it a
> regular e
>
> A1:A3
> ---------
> Café
> Café
> Cafe
>
> This is my program
> ------------------------------
> use Spreadsheet::Read;
>
> my $ref = ReadData('special_char_test.xls');
>
> my $cell1 = $ref->[1]{A1};
> my $cell2 = $ref->[1]{cell}[1][2]; #try different way
> my $cell3 = $ref->[1]{cell}[1][3];
>
> print "Cell A1: $cell1\n";
> print "Cell A2: $cell2\n";
> print "Cell A3: $cell3\n";
>
> Output (standard out):
> -------------------
> Cell A1: Caf?
> Cell A2: Caf?
> Cell A3: Cafe
>
> What can I do so that the accented e prints correctly or so the correct
> format can be saved to a csv file?
> Thanks.

Re: Spreadsheet::Read special characters handling

am 20.11.2006 16:04:56 von paduille.4060.mumia.w

On 11/20/2006 01:33 AM, anevare2@yahoo.com wrote:
> I'm using the Spreadsheet::Read module (which works quite well
> generally). I have some spreadsheets with special characters like an
> accented e (é). I'm having some trouble processing these characters.
> I haven't dealt much with these type of characters in this context in
> the past. The accented e's are coming out like "?".
>
> My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
> in A1, excel automatically made the accented e, in A2, i pressed Option
> e e for the accented e, and A3, I undid the special e to make it a
> regular e
>
> A1:A3
> ---------
> Café
> Café
> Cafe
>
> This is my program
> ------------------------------
> use Spreadsheet::Read;
>
> my $ref = ReadData('special_char_test.xls');
>
> my $cell1 = $ref->[1]{A1};
> my $cell2 = $ref->[1]{cell}[1][2]; #try different way
> my $cell3 = $ref->[1]{cell}[1][3];
>
> print "Cell A1: $cell1\n";
> print "Cell A2: $cell2\n";
> print "Cell A3: $cell3\n";
>
> Output (standard out):
> -------------------
> Cell A1: Caf?
> Cell A2: Caf?
> Cell A3: Cafe
>
> What can I do so that the accented e prints correctly or so the correct
> format can be saved to a csv file?
> Thanks.
> A.
>

I have no problems outputting accented characters from
Spreadsheet::Read. Either your Perl or your terminal is not able to deal
with the accented characters.

Try placing "use encoding 'iso-8859-1;' at the top of your program.

Recent versions of Perl (>= 5.8) should be able to handle character
encodings well, but you might have to set up your locale properly, and
you might have to configure your terminal to display those characters.


--
paduille.4060.mumia.w@earthlink.net

Re: Spreadsheet::Read special characters handling

am 20.11.2006 23:24:45 von anevare2

thanks guys.. very helpful. thanks also for referring me to that good
Unicode tutorial.

Adding this line to my perl program did the trick:
binmode STDOUT, ":utf8";

I had to do similar with the Filehandle of the file I write to.

Then, with TextWrangler, if I open that resulting file in UTF-8 mode,
it looks perfect, accent marks and all.

I'm using Perl 5.8.6 on Mac OS X

thanks so much!
A


HMerijn Brand wrote:
> On Mon, 20 Nov 2006 08:33:56 +0100, wrote:
>
> > I'm using the Spreadsheet::Read module (which works quite well
> > generally). I have some spreadsheets with special characters like an
> > accented e (=E9). I'm having some trouble processing these characters.
> > I haven't dealt much with these type of characters in this context in
> > the past. The accented e's are coming out like "?".
>
> That is based on both encoding and the font you use on the terminal.
> By default, Spreadsheet::Read does not change the encoding, which means
> that if the fields are encoded in Unicode (utf8), you should take action
> in your script to output Unicode.
>
> Read http://search.cpan.org/~rgarcia/perl-5.9.4/pod/perlunitut.po d
>
> Summary, if your terminal is capable of dealing with UTF8 (like a
> recent X11R6 xterm with utf8 enabled and font *-iso10646-1), then
> adding
>
> binmode STDOUT, ":utf8";
>
> will probably suffice. If your terminal is iso8859-*, which also
> supports the e-acute, then you will have to take appropriate actions
>
> I think that the csv file is OK already. Try opening it in whatever
> unicode enabled editor (I think both M$Word and M$Excel will do here)
> and see how it looks
>
> > My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
> > in A1, excel automatically made the accented e, in A2, i pressed Option
> > e e for the accented e, and A3, I undid the special e to make it a
> > regular e
> >
> > A1:A3
> > ---------
> > Caf=E9
> > Caf=E9
> > Cafe
> >
> > This is my program
> > ------------------------------
> > use Spreadsheet::Read;
> >
> > my $ref =3D ReadData('special_char_test.xls');
> >
> > my $cell1 =3D $ref->[1]{A1};
> > my $cell2 =3D $ref->[1]{cell}[1][2]; #try different way
> > my $cell3 =3D $ref->[1]{cell}[1][3];
> >
> > print "Cell A1: $cell1\n";
> > print "Cell A2: $cell2\n";
> > print "Cell A3: $cell3\n";
> >
> > Output (standard out):
> > -------------------
> > Cell A1: Caf?
> > Cell A2: Caf?
> > Cell A3: Cafe
> >
> > What can I do so that the accented e prints correctly or so the correct
> > format can be saved to a csv file?
> > Thanks.

Re: Spreadsheet::Read special characters handling

am 22.11.2006 01:11:47 von unknown

Al wrote:
> thanks guys.. very helpful. thanks also for referring me to that good
> Unicode tutorial.
>
> Adding this line to my perl program did the trick:
> binmode STDOUT, ":utf8";
>
> I had to do similar with the Filehandle of the file I write to.
>
> Then, with TextWrangler, if I open that resulting file in UTF-8 mode,
> it looks perfect, accent marks and all.
>
> I'm using Perl 5.8.6 on Mac OS X
>

Maybe you should look into your Terminal window settings. Use menu
Terminal/Window Settings ... and select "Display".

Tom Wyant

Re: Spreadsheet::Read special characters handling

am 05.12.2006 19:36:15 von anevare2

Hi,
Any suggestions for handling Asian characters from the original Excel?
Perl's binmode setting helps to support accented characters fine.. but
when you go beyond the 256 bits.. seems that the Spreadsheet::Read Perl
module may have no way of knowing what Excel's encoding is.

I'd like to input an excel that has Asian characters, process with
perl, and then write a csv or xml file (utf-8 encoded) with proper
Asian content.

A

harryfmudd [AT] comcast [DOT] net wrote:
> Al wrote:
> > thanks guys.. very helpful. thanks also for referring me to that good
> > Unicode tutorial.
> >
> > Adding this line to my perl program did the trick:
> > binmode STDOUT, ":utf8";
> >
> > I had to do similar with the Filehandle of the file I write to.
> >
> > Then, with TextWrangler, if I open that resulting file in UTF-8 mode,
> > it looks perfect, accent marks and all.
> >
> > I'm using Perl 5.8.6 on Mac OS X
> >
>
> Maybe you should look into your Terminal window settings. Use menu
> Terminal/Window Settings ... and select "Display".
>
> Tom Wyant

Re: Spreadsheet::Read special characters handling

am 07.12.2006 00:44:59 von unknown

Al wrote:
> Hi,
> Any suggestions for handling Asian characters from the original Excel?
> Perl's binmode setting helps to support accented characters fine.. but
> when you go beyond the 256 bits.. seems that the Spreadsheet::Read Perl
> module may have no way of knowing what Excel's encoding is.
>
> I'd like to input an excel that has Asian characters, process with
> perl, and then write a csv or xml file (utf-8 encoded) with proper
> Asian content.
>
> A

I'm not an expert on non-ASCII character sets, so the following is
somewhat provisional. But the thread has been fallow for about a day and
a half, and I figure if I say something horribly wrong someone will jump
at the opportunity to correct me.

Anyhow, this is what I _think_ the situation is.

I've never used Spreadsheet::Read, but the docs look like it's an
umbrella module, and under the hood it selects the correct module to
read the spreadsheet you gave it. The docs also seem to say that for
Excel it's Spreadsheet::ParseExcel.

Spreadsheet::ParseExcel apparantly will take a filehandle instead of a
spreadsheet name, giving you the opportunity to set the encoding you
want when you open the input file or when you binmode() it. See the docs
for Encode::PerlIO.

I could have sworn I saw documentation somewhere in the Encode-related
modules for a subroutine that would try to guess the encoding of a chunk
of text, but at the moment I can't find it.

Tom Wyant

Re: Spreadsheet::Read special characters handling

am 09.12.2006 20:43:18 von unknown

harryfmudd [AT] comcast [DOT] net wrote:

> Al wrote:
>
>> Hi,
>> Any suggestions for handling Asian characters from the original Excel?
>> Perl's binmode setting helps to support accented characters fine.. but
>> when you go beyond the 256 bits.. seems that the Spreadsheet::Read Perl
>> module may have no way of knowing what Excel's encoding is.
>>
>> I'd like to input an excel that has Asian characters, process with
>> perl, and then write a csv or xml file (utf-8 encoded) with proper
>> Asian content.
>>
>> A
>
>
> I'm not an expert on non-ASCII character sets, so the following is
> somewhat provisional. But the thread has been fallow for about a day and
> a half, and I figure if I say something horribly wrong someone will jump
> at the opportunity to correct me.
>
> Anyhow, this is what I _think_ the situation is.
>
> I've never used Spreadsheet::Read, but the docs look like it's an
> umbrella module, and under the hood it selects the correct module to
> read the spreadsheet you gave it. The docs also seem to say that for
> Excel it's Spreadsheet::ParseExcel.
>
> Spreadsheet::ParseExcel apparantly will take a filehandle instead of a
> spreadsheet name, giving you the opportunity to set the encoding you
> want when you open the input file or when you binmode() it. See the docs
> for Encode::PerlIO.
>
> I could have sworn I saw documentation somewhere in the Encode-related
> modules for a subroutine that would try to guess the encoding of a chunk
> of text, but at the moment I can't find it.
>
> Tom Wyant

It's Encode::Guess. Duh.

Tom Wyant