ANNOUNCE: Text::CSV_XS 0.32

ANNOUNCE: Text::CSV_XS 0.32

am 24.10.2007 13:51:35 von h.m.brand

The following report has been written by the PAUSE namespace indexer.
Please contact modules@perl.org if there are any open questions.
Id: mldistwatch 925 2007-09-16 15:41:11Z k

User: HMBRAND (H.Merijn Brand)
Distribution file: Text-CSV_XS-0.32.tgz
Number of files: 28
*.pm files: 1
README: Text-CSV_XS-0.32/README
META.yml: Text-CSV_XS-0.32/META.yml
Timestamp of file: Wed Oct 24 11:26:57 2007 UTC
Time of this run: Wed Oct 24 11:28:25 2007 UTC

2007-10-24 0.32 - H.Merijn Brand

* Added $csv->error_diag () to SYNOPSIS
* Added need for diag when new () fails to TODO
* Fixed a sneaked-in defined or in examples/csv2xls
* Plugged a 32byte memory leak in the cache code (valgrind++)
* Some perlcritic level1 changes

2007-07-23 0.31 - H.Merijn Brand

* Removed prototypes in examples/csv2xls
* Improved usage for examples/csv2xls (GetOpt::Long now does
--help/-?)
* Extended examples/csv2xls to deal with Unicode (-u)
* Serious bug in Text::CSV_XS::NV () type setting, causing the
resulting field to be truncated to IV

2007-06-21 0.30 - H.Merijn Brand

* ,\rx, is definitely an error without binary (used to HANG!)
* Fixed bug in attribute caching for undefined eol
* Cleaned up some code after -W*** warnings
* Added verbatim.
* More test to cover the really dark corners and edge cases
* Even more typo fixes in the docs
* Added error_diag ()
* Added t/80_diag.t - Will not be mirrored by Text::CSV_PP
* Added DIAGNOSTICS section to pod - Will grow
* Small pod niot (abeltje)
* Doc fix in TODO (Miller Hall)

Re: ANNOUNCE: Text::CSV_XS 0.32

am 25.10.2007 06:59:01 von Petr Vileta

H.Merijn Brand wrote:
> The following report has been written by the PAUSE namespace indexer.
> Please contact modules@perl.org if there are any open questions.
> Id: mldistwatch 925 2007-09-16 15:41:11Z k
>
> User: HMBRAND (H.Merijn Brand)
> Distribution file: Text-CSV_XS-0.32.tgz

Well, I'm pleased to see you here :-)
I tried to use your module Text::CSV_XS for storing some data to CSV file
but without success. The problem is national characters. When I tried
$csv->combine(('abc',áíá','def') I got "abc\n" only. Your module fail on
first field where something greather then \x7f is. But no error, no warning.
Is this a bug or feature?

--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)

Re: ANNOUNCE: Text::CSV_XS 0.32

am 26.10.2007 09:16:23 von paduille.4061.mumia.w+nospam

On 10/24/2007 11:59 PM, Petr Vileta wrote:
> H.Merijn Brand wrote:
>> The following report has been written by the PAUSE namespace indexer.
>> Please contact modules@perl.org if there are any open questions.
>> Id: mldistwatch 925 2007-09-16 15:41:11Z k
>>
>> User: HMBRAND (H.Merijn Brand)
>> Distribution file: Text-CSV_XS-0.32.tgz
>
> Well, I'm pleased to see you here :-)
> I tried to use your module Text::CSV_XS for storing some data to CSV
> file but without success. The problem is national characters. When I
> tried $csv->combine(('abc',áíá','def') I got "abc\n" only. Your module
> fail on first field where something greather then \x7f is. But no error,
> no warning.
> Is this a bug or feature?
>

This sort-of works for me:

#!/usr/bin/perl
use strict;
use warnings;
use encoding 'iso-8859-1';
use Text::CSV_XS 0.32;

print "Version = $Text::CSV_XS::VERSION\n";

my $csv = Text::CSV_XS->new({binary => 1});
$csv->combine('abc','áíá','def') or warn("problem: $!\n");
print $csv->string(), "\n";

__END__

However, the output seems to be forced to UTF-8:

Version = 0.32
abc,áíá,def

The above is properly interpreted in utf-8 as this:

Version = 0.32
abc,áíá,def

So Text::CSV_XS seems to ignore both the script encoding and the locale.
I had set LANG=en_US.ISO-8859-1 in Linux before running the script.

And no error message is placed into $! upon error. I know, this is in
the TODO section :-)

Re: ANNOUNCE: Text::CSV_XS 0.32

am 26.10.2007 21:13:05 von h.merijn

On Thu, 25 Oct 2007 06:59:01 +0200, Petr Vileta
wrote:

> H.Merijn Brand wrote:
>> The following report has been written by the PAUSE namespace indexer.
>> Please contact modules@perl.org if there are any open questions.
>> Id: mldistwatch 925 2007-09-16 15:41:11Z k
>>
>> User: HMBRAND (H.Merijn Brand)
>> Distribution file: Text-CSV_XS-0.32.tgz
>
> Well, I'm pleased to see you here :-)

I've been here before, but I prefer private mail :)

> I tried to use your module Text::CSV_XS for storing some data to CSV
> file but without success. The problem is national characters. When I
> tried $csv->combine(('abc',áíá','def') I got "abc\n" only.

As both Mumia and the docs make (now) VERY clear, you need the binary
flag. This version has made that even more clear. You *do* read the
docs, right?
--8<---
Important Note: The default behavior is to only accept ascii
characters. This means that fields can not contain newlines. If
your
data contains newlines embedded in fields, or characters above 0x7e
(tilde), or binary data, you *must* set "binary => 1" in the call to
"new ()". To cover the widest range of parsing options, you will
always want to set binary.
-->8---

> Your module fail on first field where something greather then \x7f is.

My module doesn't fail here. It is the default, documented, and correct
behaviour :)

> But no error, no warning.
> Is this a bug or feature?

Feature, or documented behaviour. Whatever you prefer.

In the distribution, check out t/50_utf8.t to see how you should be
dealing with non-ASCII characters. Maybe I can put that example in
the documentation, as I keep refering to that file.

Re: ANNOUNCE: Text::CSV_XS 0.32

am 27.10.2007 06:12:47 von Petr Vileta

Mumia W. wrote:
> On 10/24/2007 11:59 PM, Petr Vileta wrote:
>> Well, I'm pleased to see you here :-)
>> I tried to use your module Text::CSV_XS for storing some data to CSV
>> file but without success. The problem is national characters. When I
>> tried $csv->combine(('abc',áíá','def') I got "abc\n" only. Your
>> module fail on first field where something greather then \x7f is.
>> But no error, no warning.
>> Is this a bug or feature?
>>
>
> This sort-of works for me:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use encoding 'iso-8859-1';
> use Text::CSV_XS 0.32;
>
> print "Version = $Text::CSV_XS::VERSION\n";
>
> my $csv = Text::CSV_XS->new({binary => 1});

I suppose that binari is intended for "unprintable" characters.

>
> However, the output seems to be forced to UTF-8:
>

I can't to use utf-8, I must use iso-8859-1 for some reason.

> So Text::CSV_XS seems to ignore both the script encoding and the
> locale. I had set LANG=en_US.ISO-8859-1 in Linux before running the
> script.

Hmm, ignore but not thoroughly :-) I avoid using combine() finction by this
sub

sub mycombine
{
my @fileds=@_;
my $line = '';
foreach (@fileds)
{
s/\"/\"\"/g;
$line .= '"' . $_ . '"';
}
$line .= chr(13) . $chr(10);
return $line;
}

Ys, of course, this not look to filed type (number or string) but for my
intention this is sufficient. This work with program and locales settings.
Maybe will be good to add some functions to your module to set up input and
output codepages. Some like
$csv = $csv = Text::CSV_XS->new('input_charser' => 'utf-8', 'output_charset
=> 'iso-8859-1');
But this is my idea only ;-)
--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)

Re: ANNOUNCE: Text::CSV_XS 0.32

am 27.10.2007 09:38:38 von paduille.4061.mumia.w+nospam

On 10/26/2007 11:12 PM, Petr Vileta wrote:
> [...]
> I avoid using combine() finction by
> this sub
>
> sub mycombine
> {
> my @fileds=@_;
> my $line = '';
> foreach (@fileds)
> {
> s/\"/\"\"/g;
> $line .= '"' . $_ . '"';
> }
> $line .= chr(13) . $chr(10);
> return $line;
> }
>
> Ys, of course, this not look to filed type (number or string) but for my
> intention this is sufficient. This work with program and locales
> settings. Maybe will be good to add some functions to your module to set
> up input and output codepages. Some like
> $csv = $csv = Text::CSV_XS->new('input_charser' => 'utf-8',
> 'output_charset => 'iso-8859-1');
> But this is my idea only ;-)

I've just discovered that this works perfectly for me:

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS 0.32;

print "Version = $Text::CSV_XS::VERSION\n";

my $csv = Text::CSV_XS->new({binary => 1});
$csv->combine('abc','áíá','def') or warn("problem: $!\n");
print $csv->string(), "\n";

__END__

The above code outputs latin1 characters as expected.

For some reason, Text::CSV_XS doesn't like the encoding pragma:

#!/usr/bin/perl
use strict;
use warnings;
use encoding 'latin1';
use Text::CSV_XS 0.32;

print "Version = $Text::CSV_XS::VERSION\n";

my $csv = Text::CSV_XS->new({binary => 1});
$csv->combine('abc','áíá','def') or warn("problem: $!\n");
print $csv->string(), "\n";

__END__

The above outputs utf8-data; "áíá" is converted into "áíá"

However, if "binmode(STDOUT, ':encoding(latin1)');" is placed before the
print commands, the output is correct. I don't know if this is a bug in
Text::CSV_XS or not.

This is with Perl 5.8.4 and Text::CSV_XS 0.32. I had set
LANG=en_US.ISO-8859-1 under Linux.

Re: ANNOUNCE: Text::CSV_XS 0.32

am 12.11.2007 15:22:32 von h.m.brand

On Sat, 27 Oct 2007 06:12:47 +0200, Petr Vileta
wrote:

> Mumia W. wrote:
>> On 10/24/2007 11:59 PM, Petr Vileta wrote:
>>> Well, I'm pleased to see you here :-)
>>> I tried to use your module Text::CSV_XS for storing some data to CSV
>>> file but without success. The problem is national characters. When I
>>> tried $csv->combine(('abc',áíá','def') I got "abc\n" only. Your
>>> module fail on first field where something greather then \x7f is.
>>> But no error, no warning.
>>> Is this a bug or feature?
>>
>> This sort-of works for me:
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use encoding 'iso-8859-1';
>> use Text::CSV_XS 0.32;
>>
>> print "Version = $Text::CSV_XS::VERSION\n";
>>
>> my $csv = Text::CSV_XS->new({binary => 1});
>
> I suppose that binari is intended for "unprintable" characters.

depends. Do you think \x{d7} is unprintable? or \x{20ac}

>> However, the output seems to be forced to UTF-8:

Text::CSV_XS doesn't know anything about encoding.

> [snip]

> Maybe will be good to add some functions to your module to set up input
> and output codepages. Some like
> $csv = $csv = Text::CSV_XS->new('input_charser' => 'utf-8',
> 'output_charset => 'iso-8859-1');

That would of course be

my $csv = Text::CSV_XS->new ({
input_charset => "utf-8",
output_charset => "iso-8859-1",
});

1: s/charser/charset/
2: put in an anon-hash

The idea sounds nice, but would severely slow down all
scripts that use Text::CSV_XS in a transparent mode,
without Encoding/Decoding.

It is rather easy to do it right from the user point of view.
Here's the snippet used in the test suite to check if encoding
works (t/50_utf8.t):

my $csv = Text::CSV_XS->new ({ binary => 1, always_quote => 1 });

# Special characters to check:
# 0A = \n 2C = , 20 = 22 = "
# 0D = \r 3B = ;
foreach my $test (
# Space-like characters
[ "\x{0000A0}", "U+0000A0 NO-BRAK SPACE" ],
[ "\x{00200B}", "U+00200B ZERO WIDTH SPACE" ],
# Some characters with possible problems in the code point
[ "\x{000122}", "U+000122 LATIN CAPITAL LETTER G WITH CEDILLA" ],
[ "\x{002C22}", "U+002C22 GLAGOLITIC CAPITAL LETTER SPIDERY HA" ],
[ "\x{000A2C}", "U+000A2C GURMUKHI LETTER BA" ],
[ "\x{000E2C}", "U+000E2C THAI CHARACTER LO CHULA" ],
[ "\x{010A2C}", "U+010A2C KHAROSHTHI LETTER VA" ],
# Characters with possible problems in the encoded representation
# Should not be possible. ASCII is coded in 000..127, all other
# characters in 128..255
) {
my ($u, $msg) = @$test;
utf8::encode ($u);
my @in = ("", " ", $u, "");
my $exp = join ",", map { qq{"$_"} } @in;

ok ($csv->combine (@in), "combine $msg");

my $str = $csv->string;
is_binary ($str, $exp, "string $msg");

ok ($csv->parse ($str), "parse $msg");
my @out = $csv->fields;
# Cannot use is_deeply (), because of the binary content
is (scalar @in, scalar @out, "fields $msg");
for (0 .. $#in) {
is_binary ($in[$_], $out[$_], "field $_ $msg");
}
}

> But this is my idea only ;-)

Re: ANNOUNCE: Text::CSV_XS 0.32

am 13.11.2007 04:20:58 von Petr Vileta

H.Merijn Brand wrote:
> On Sat, 27 Oct 2007 06:12:47 +0200, Petr Vileta
> wrote:

[snip]

>> I suppose that binary is intended for "unprintable" characters.
>
> depends. Do you think \x{d7} is unprintable? or \x{20ac}
>

Ehm, yes ;-) I meant unprintable in \x00 to \xff code range, so all
characters less then \x20 except \x0a, \x0d, \x09.

[snip]

> That would of course be
>
> my $csv = Text::CSV_XS->new ({
> input_charset => "utf-8",
> output_charset => "iso-8859-1",
> });
>
> The idea sounds nice, but would severely slow down all
> scripts that use Text::CSV_XS in a transparent mode,
> without Encoding/Decoding.
>

But you can check if programmer set both charsets in ->new() part of module.
If both charsets are set then run in "translate" mode, if none is set then
run in "transparent" mode and if only one is set then return error.

--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)