csv data file, but with a twist

csv data file, but with a twist

am 11.01.2008 22:27:05 von nun

I have a comma-delimited text file (example data below). The problem is
that *some* of the lines contain commas within the second field (named
DESC) which is obviously a problem.

I'd like to replace any such commas with a space followed by a hyphen.

Actual sample data:
SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001


What I'd like to and up with:

SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK - RED ,0003.11,0002.33,N, , ,254-79 ,001

There *may* be lines which have more than one comma in the DESCR field.
but luckily the 3rd field always has the format XXXX.XX (four digits,
decimal, 2 digits)

Can any suggest a hunk of code that will accomplish this? I've seen
some "sed one liners" online that come close but so far no success.
Thanks for any help!

DB

Re: csv data file, but with a twist

am 11.01.2008 22:52:00 von glex_no-spam

nun wrote:
> I have a comma-delimited text file (example data below). The problem is
> that *some* of the lines contain commas within the second field (named
> DESC) which is obviously a problem.
>
> I'd like to replace any such commas with a space followed by a hyphen.
>
> Actual sample data:
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001
>
>
> What I'd like to and up with:
>
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK - RED ,0003.11,0002.33,N, , ,254-79 ,001
>
> There *may* be lines which have more than one comma in the DESCR field.
> but luckily the 3rd field always has the format XXXX.XX (four digits,
> decimal, 2 digits)
>
> Can any suggest a hunk of code that will accomplish this? I've seen
> some "sed one liners" online that come close but so far no success.
> Thanks for any help!

This can be a little shorter, but since you didn't show any code I'll
keep it pretty simple.

while( )
{
my @field = split /,/;
if ( $field[2] !~ /\d\d\d\d\.\d\d/ )
{
$field[1] .= " -$field[2]";
splice( @field, 2, 1 );
}
print join( ',', @field );
}
__DATA__
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001

The real issue though is the program that is creating this data.

Re: csv data file, but with a twist

am 11.01.2008 23:39:30 von Martien Verbruggen

On Fri, 11 Jan 2008 16:27:05 -0500,
nun wrote:
> I have a comma-delimited text file (example data below). The problem is
> that *some* of the lines contain commas within the second field (named
> DESC) which is obviously a problem.

A "real" CSV file would have had quotes around that field, so this is,
and I suppose that's the problem, a broken CSV file. I'm assuming it's
not possible to fix whatever created the file in the first place?

> Actual sample data:
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001
>
>
> What I'd like to and up with:
>
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK - RED ,0003.11,0002.33,N, , ,254-79 ,001

> There *may* be lines which have more than one comma in the DESCR field.
> but luckily the 3rd field always has the format XXXX.XX (four digits,
> decimal, 2 digits)

It would have been nice if you actually included that in the sample
file.

Actually, you HAVE a line with the third field in the example file in
that format. In fact, you have two. However, I don't think that you mean
you want those joined up as well. Given that that field always has that
format, it's hardly going to be possible to use that fact. I've added an
example record with that extra third line.

So, maybe it's better to simply trigger on the number of fields. I can
assume that the target number of fields is always the same? 9?

The following simply concantenates fields 2 and up to field 1 if there
are too many fields, in such a way that you end up with 9 fields.

#!/usr/bin/perl
use warnings;
use strict;

while ()
{
print, next if $. == 1;
my @l = split /,/;
if (@l != 9)
{
$l[1] = join " - ", @l[1 .. @l - 8];
splice @l, 2, @l - 9;
print join ",", @l;
}
else
{
print;
}
}

__DATA__
SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001
0130 ,HOOK, BLUE, 9999.99 ,0003.11,0002.33,N, , ,254-79 ,001

I would probably also leave the description field as it was, rather than
rewriting it.

$l[1] = '"' . join(",", @l[1 .. @l - 8]) . '"';

Also, if you want the leading and trailing spaces to be preserved, you
should also enclose those with quotes. You might as well enclose all of
them with quotes, which would also make the code a bit simpler:

#!/usr/bin/perl
use warnings;
use strict;

while ()
{
my @l = split /,/;
if (@l != 9)
{
$l[1] = join " - ", @l[1 .. @l - 8];
splice @l, 2, @l - 9;
}
s/(.*)/"$1"/ for @l;
print join ",", @l;
}

__DATA__
SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001
0130 ,HOOK, BLUE, 9999.99 ,0003.11,0002.33,N, , ,254-79 ,001

If you don't blindly want to put quotes around everything, you could use
Text::CSV to create the CSV record for you (make sure to remove the
newline first or to set the correct options for CSV::Text).

Regards,
Martien
--
|
Martien Verbruggen | "In a world without fences,
| who needs Gates?"
|

Re: csv data file, but with a twist

am 12.01.2008 00:06:48 von someone

nun wrote:
> I have a comma-delimited text file (example data below). The problem is
> that *some* of the lines contain commas within the second field (named
> DESC) which is obviously a problem.
>
> I'd like to replace any such commas with a space followed by a hyphen.
>
> Actual sample data:
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001
>
>
> What I'd like to and up with:
>
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK - RED ,0003.11,0002.33,N, , ,254-79 ,001
>
> There *may* be lines which have more than one comma in the DESCR field.
> but luckily the 3rd field always has the format XXXX.XX (four digits,
> decimal, 2 digits)
>
> Can any suggest a hunk of code that will accomplish this? I've seen
> some "sed one liners" online that come close but so far no success.
> Thanks for any help!

my @headers;
while ( <> ) {
if ( $. == 1 ) {
@headers = split /,/, $_, -1;
print;
next;
}
my @fields = split /,/, $_, -1;
if ( @fields > @headers ) {
my $offset = @fields - @headers + 1;
splice @fields, 1, $offset, join ' - ', @fields[ 1 .. $offset ];
}
print join ',', @fields;
}



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: csv data file, but with a twist

am 12.01.2008 00:32:35 von t.x.michaels

Try the following (I know, it's ugly and verbose, but it's me!)

grab everything between "==========" lines

============================================================ ============

#!/usr/bin/perl

use strict;
use warnings;

my $line;
my @fields;
my $desired_number_of_fields;
my $number_of_fields_read;
my $number_of_excess_fields;

while ( $line = ) {
chomp $line;
@fields = split /,/, $line;

if ( $line =~ /^SKU/ ) {
$desired_number_of_fields = scalar @fields;
print "$line\n";
next;
}

$number_of_fields_read = scalar @fields;

if ( $number_of_fields_read == $desired_number_of_fields ) {
#No processing necessary
print "$line\n";
}
else {
$number_of_excess_fields
= $number_of_fields_read - $desired_number_of_fields + 1;
#Need to merge a few fields together, beginning with field 2
my @new_second_field
= splice @fields, 1, $number_of_excess_fields;
#Replace commas with space dash
my $new_second_field = join " -", @new_second_field;
#Put this new second field back into array
splice @fields, 1, 0, $new_second_field;
$line = join ",", @fields;
print "$line\n";
}
}
__DATA__
SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001

============================================================ ============

your output will be:

SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
0130 ,HOOK - RED ,0003.11,0002.33,N, , ,254-79 ,001

is this what you want???




On Jan 11, 3:27 pm, nun wrote:
> I have a comma-delimited text file (example data below). The problem is
> that *some* of the lines contain commas within the second field (named
> DESC) which is obviously a problem.
>
> I'd like to replace any such commas with a space followed by a hyphen.
>
> Actual sample data:
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001
>
> What I'd like to and up with:
>
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
> 0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
> 0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
> 0130 ,HOOK - RED ,0003.11,0002.33,N, , ,254-79 ,001
>
> There *may* be lines which have more than one comma in the DESCR field.
> but luckily the 3rd field always has the format XXXX.XX (four digits,
> decimal, 2 digits)
>
> Can any suggest a hunk of code that will accomplish this? I've seen
> some "sed one liners" online that come close but so far no success.
> Thanks for any help!
>
> DB

Re: csv data file, but with a twist

am 12.01.2008 01:47:23 von jurgenex

nun wrote:
>I have a comma-delimited text file (example data below). The problem is
>that *some* of the lines contain commas within the second field (named
>DESC) which is obviously a problem.

Indeed, because that's not a CSV file any more. CSV allows enclosing those
fields with quotes, then you can have commata as data.

>I'd like to replace any such commas with a space followed by a hyphen.
>
>Actual sample data:
>SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
>0090 ,CUP-HOOK ,0012.34,0007.40,N,O, ,254-61,001
>0110 ,HOOK ,0008.71,0006.53,Y,O, , ,001
>0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
>0130 ,HOOK, RED ,0003.11,0002.33,N, , ,254-79 ,001

Just for fun because TIMTOWTDI:
- Extract the first field,
- reverse the remaining string,
- extract fields 7 to 3,
- whatever is left is field 2.
- reverse content of fields 2 to 7,
- process field 2,
- recombine fields 1-7

jue

Re: csv data file, but with a twist

am 12.01.2008 18:52:33 von rvtol+news

nun schreef:

> Actual sample data:
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
>
> What I'd like to and up with:
> 0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
>
> There *may* be lines which have more than one comma in the DESCR
> field. but luckily the 3rd field always has the format XXXX.XX (four
> digits, decimal, 2 digits)


To start you off:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @data = split /((?<=,)[0-9]{4}[.][0-9]{2},)/, , 2;
print Dumper \@data;

$data[0] = \@{[ split /\s*,/, $data[0], 2 ]};
$data[2] = \@{[ split /\s*,/, $data[2] ]};
print Dumper \@data;

__DATA__
0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001

--
Affijn, Ruud

"Gewoon is een tijger."

Re: csv data file, but with a twist

am 12.01.2008 22:46:22 von someone

Dr.Ruud wrote:
> nun schreef:
>
>> Actual sample data:
>> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
>>
>> What I'd like to and up with:
>> 0120 ,HOOK - TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001
>>
>> There *may* be lines which have more than one comma in the DESCR
>> field. but luckily the 3rd field always has the format XXXX.XX (four
>> digits, decimal, 2 digits)
>
>
> To start you off:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
>
> my @data = split /((?<=,)[0-9]{4}[.][0-9]{2},)/, , 2;
> print Dumper \@data;
>
> $data[0] = \@{[ split /\s*,/, $data[0], 2 ]};
> $data[2] = \@{[ split /\s*,/, $data[2] ]};

Why create an anonymous array reference, and then dereference it, and
then reference that dereference?


> print Dumper \@data;
>
> __DATA__
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: csv data file, but with a twist

am 13.01.2008 01:17:36 von rvtol+news

John W. Krahn schreef:
> Dr.Ruud:

>> $data[0] = \@{[ split /\s*,/, $data[0], 2 ]};
>
> Why create an anonymous array reference, and then dereference it, and
> then reference that dereference?

Oops, was playing away with the code and just never went back.

$data[0] = [ split /\s*,/, $data[0], 2 ];

--
Affijn, Ruud

"Gewoon is een tijger."

Re: csv data file, but with a twist

am 13.01.2008 13:13:43 von rvtol+news

Martien Verbruggen schreef:

> while ()
> {
> print, next if $. == 1;
> my @l = split /,/;
> if (@l != 9)
> {
> $l[1] = join " - ", @l[1 .. @l - 8];
> splice @l, 2, @l - 9;
> print join ",", @l;
> }
> else
> [...]
>
> __DATA__
> SKU,DESC,LIST,COST,FLAG1,FLAG2,FLAG3,RELATED,FLAG4
> 0120 ,HOOK, TAPERED ,0004.57,0002.74,N,O, ,254-72 ,001

I really like this approach, but some people wil not appreciate the
magical "9" in the code. It could be derived from the header of course.
We were told that the "3rd field always has the format XXXX.XX", so I
wrote a variant that uses that:

while () {
print and next if 1 == $.; # skip header
chomp;
my @data = split /,/;
while ($data[2] !~ /^\s*[0-9]{4}[.][0-9]{2}\s*$/) {
@data = ($data[0], "$data[1] - $data[2]", @data[3..$#data]);
}
/^\s|\s$/ and s/(.*)/"$1"/ for @data; # protect extreme whitespace
print join(",", @data), "\n";
}


A shorter variant, that removes extreme whitespace early:

while () {
print and next if 1 == $.; # skip header
my @data = split /\s*,\s*/;
while ($data[2] !~ /^[0-9]{4}[.][0-9]{2}$/) {
@data = ($data[0], "$data[1] - $data[2]", @data[3..$#data]);
}
print join ",", @data;
}

--
Affijn, Ruud

"Gewoon is een tijger."