regexp help needed

regexp help needed

am 15.09.2005 22:26:38 von Chuck Carson

I need to parse a long string of values that are always seperated by
commas with no white space anywhere in the stream. Any given line may
contain null values (thus you see to comma's in a row)

Here is a sample input line:
140,0,3,0,adcbkp20_os,full,adcbkp20,adcbkp20,1126489655,0000 000207,1126489862,ADCBKP20,1,,5640064,117305,,100,14035,root ,1,0,0,90,,adcbkp20,2,2,0,,,0,0,0,43570,,,,,,14044,,,,,0,,,1 ,0,0,adcbkp20_1126489655,,

I have a regexp that accounts for every possible character type in each
field, but my match never succeeds so I am thinking that the null
fields are causing the problem.

Here is my regexp:
m/(\d+),(\d+),(\d+),(\d+),([a-zA-Z0-9_\-.]+),(\w+),([a-zA-Z0 -9_\-.]+),([a-zA-Z0-9_\-.]+),(\d+),(\d+),(\d+),([a-zA-Z0-
9_\-.]+),(\d+),(\d+),(\d+),(\d+),([a-zA-Z0-9_\-\/.]+),(\d+), (\d+),(\w+),(\d+),(\d+),(\d+),(\d+),(\w+),([a-zA-Z0-9_\-]+), (\d+),(\d+),(\d+),(\d+)
,(\d+),(\d+),(\d+),(\d+),(\d+),(\d+),([a-zA-Z0-9\(\)]+),([a- zA-Z0-9_\-.]+),([a-zA-Z0-9_\-.]+),(\d+),(\d+),([a-zA-Z0-9_\- .]+),([a-zA-Z0-9_\-.]+)
,([a-zA-Z0-9_\-.]+),([a-zA-Z0-9_\-.]+),(\d+),(\d+),(\d+),(\d +),(\d+),(\d+),([a-zA-Z0-9_\-.]+),(\d+),([a-zA-Z0-9_\-.]+)/

My question I guess is how do I get this to match?

$var = "100,,adcbkp20-b";

if ($var =~ m/(\d+),(\d+),([a-zA-Z0-9_\-]+)/ ) { print "match.\n"; }

Thus, the second field is null, woulnd't $2 be undefined yet the entire
string will match? (thus $1 and $3 have values) If not, how can I
achieve this?

Thanks,
CC

Re: regexp help needed

am 16.09.2005 18:39:30 von Jim Gibson

In article <1126815998.471234.215520@g49g2000cwa.googlegroups.com>,
wrote:

> I need to parse a long string of values that are always seperated by
> commas with no white space anywhere in the stream. Any given line may
> contain null values (thus you see to comma's in a row)
>
[humongous regex snipped]

>
> My question I guess is how do I get this to match?
>
> $var = "100,,adcbkp20-b";
>
> if ($var =~ m/(\d+),(\d+),([a-zA-Z0-9_\-]+)/ ) { print "match.\n"; }
>
> Thus, the second field is null, woulnd't $2 be undefined yet the entire
> string will match? (thus $1 and $3 have values) If not, how can I
> achieve this?

The '+' character means 'match 1 or more times', so $var will not
match. Use the '*' character, which means 'match 0 or more times.'

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Re: regexp help needed

am 16.09.2005 19:02:39 von Paul Lalli

chuck.carson@gmail.com wrote:
> I need to parse a long string of values that are always seperated by
> commas with no white space anywhere in the stream. Any given line may
> contain null values (thus you see to comma's in a row)
>
> Here is a sample input line:
>

< absurdly long CSV line snipped>

> I have a regexp that accounts for every possible character type in each
> field

This is a poor algorithm choice. You should use matching when you know
exactly what you want to capture. You should use split when you know
exactly what you want to throw away (that is, when you know what
separates the things you do want).

my @values = split /,/, $aburdly_long_csv_string;

> , but my match never succeeds so I am thinking that the null
> fields are causing the problem.
>
> Here is my regexp:

< equally absurdly long regexp snipped>

> My question I guess is how do I get this to match?
>
> $var = "100,,adcbkp20-b";
>
> if ($var =~ m/(\d+),(\d+),([a-zA-Z0-9_\-]+)/ ) { print "match.\n"; }

You need to review the Regexp basics. The + quantifier means "one or
more". Since 0 digits is a valid option, you should be using the *
quantifier, which means "0 or more".

> Thus, the second field is null, woulnd't $2 be undefined yet the entire
> string will match? (thus $1 and $3 have values) If not, how can I
> achieve this?

Again, you shouldn't be using regexps for this in the first place. For
the simple CSVs you've shown (no spaces, no commas embedded in quotes),
simply use a split. For anything more complicated, try the Text::CSV
module.

Paul Lalli