Regex help

Regex help

am 17.05.2011 00:44:12 von Owen

I am trying to get all the 6 letter names in the second field in DATA
below, eg

BARTON
DARWIN
DARWIN

But the script below gives me all 6 letter and more entries.

What I read says {6} means exactly 6.

What is the correct RE?

I have solved the problem my using if (length($data[1]) == 6 ) but
would love to know the correct syntax for the RE


TIA


Owen


============================================================ =====

#!/usr/bin/perl

use strict;
use warnings;

while () {
my $line = $_;

my @line = split /,/;
$line[1] =~ s /\"//g;

print "$line[1]\n" if $line[1] =~ /\S{6}/;
}

__DATA__
"0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
"0221","BARTON","ACT","LVR Special Mailing"
"0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
"0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
"0804","PARAP","NT","PO Boxes","PARAP LPO"
"0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
"0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
"0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
"0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"

============================================================ ===

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Regex help

am 17.05.2011 01:00:23 von Jim Gibson

On 5/16/11 Mon May 16, 2011 3:44 PM, "Owen" scribbled:

> I am trying to get all the 6 letter names in the second field in DATA
> below, eg
>
> BARTON
> DARWIN
> DARWIN
>
> But the script below gives me all 6 letter and more entries.
>
> What I read says {6} means exactly 6.

\S{6} will match any string containing 6 consecutive non-whitespace
characters. It will also match any string containing more than 6 such
characters, because any such string contains within it a substring of
exactly six characters. Perl matches do not have to match the entire string.

>
> What is the correct RE?

If you want exactly six characters, then you need to specify that any
characters before or after the wanted six are not also members of the
desired class. In your case, the easiest way is to anchor the match at the
beginning and the end:

$line[1] =~ /^\S{6}$/

If you were looking for word characters, e.g. \w, you could use the word
boundary assertion metasymbol \b:

$line[1] =~ /\b\w{6}\b/

That will not work if your names contain punctuation characters, e.g
O'Reilly. More complex matches can use the negative lookahead and lookbehind
constructs.

>
> I have solved the problem my using if (length($data[1]) == 6 ) but
> would love to know the correct syntax for the RE
>
>
> TIA
>
>
> Owen
>
>
> ============================================================ =====
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> while () {
> my $line = $_;
>
> my @line = split /,/;
> $line[1] =~ s /\"//g;
>
> print "$line[1]\n" if $line[1] =~ /\S{6}/;
> }
>
> __DATA__
> "0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
> "0221","BARTON","ACT","LVR Special Mailing"
> "0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
> "0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
> "0804","PARAP","NT","PO Boxes","PARAP LPO"
> "0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
> "0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
> "0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
> "0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"
>
> ============================================================ ===



--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Regex help

am 17.05.2011 12:52:34 von Rob Dixon

On 16/05/2011 23:44, Owen wrote:
>
> I am trying to get all the 6 letter names in the second field in DATA
> below, eg
>
> BARTON
> DARWIN
> DARWIN
>
> But the script below gives me all 6 letter and more entries.
>
> What I read says {6} means exactly 6.
>
> What is the correct RE?
>
> I have solved the problem my using if (length($data[1]) == 6 ) but
> would love to know the correct syntax for the RE
>
>
> ============================================================ =====
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> while () {
> my $line = $_;
>
> my @line = split /,/;
> $line[1] =~ s /\"//g;
>
> print "$line[1]\n" if $line[1] =~ /\S{6}/;
> }
>
> __DATA__
> "0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
> "0221","BARTON","ACT","LVR Special Mailing"
> "0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
> "0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
> "0804","PARAP","NT","PO Boxes","PARAP LPO"
> "0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
> "0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
> "0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
> "0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"
>
> ============================================================ ===

Hi Owen.

Your test establishes only whether the pattern can be found within the
object string a test like

"CASUARINA" =~ /\S{6}/;

finds the six non-space characters "CASUAR" and then returns success as
the criterion has been satisfied.

To get it to match /only/ six-character non-space strings you can add
anchors at the beginning and end of the regex:

"CASUARINA" =~ /^\S{6}$/;

will fail because the sequence "beginning of line, six non-space
characters, end of line" don't appear in "CASUARINA".

But the proper way to do this is to forget about regular expressions and
treat the data as comma-separated fields. The module Text::CSV will do
this for you, as per the progrm below.

HTH,

Rob


use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new;

while (my $fields = $csv->getline(*DATA)) {
my $suburb = $fields->[1];
next unless $suburb and length $suburb == 6;
print $suburb, "\n";
}

__DATA__
"0200","AUSTRALIAN NATIONAL UNIVERSITY","ACT","PO Boxes"
"0221","BARTON","ACT","LVR Special Mailing"
"0800","DARWIN","NT",,"DARWIN DELIVERY CENTRE"
"0801","DARWIN","NT","GPO Boxes","DARWIN GPO DELIVERY ANNEXE"
"0804","PARAP","NT","PO Boxes","PARAP LPO"
"0810","ALAWA","NT",,"DARWIN DELIVERY CENTRE"
"0810","BRINKIN","NT",,"DARWIN DELIVERY CENTRE"
"0810","CASUARINA","NT",,"DARWIN DELIVERY CENTRE"
"0810","COCONUT GROVE","NT",,"DARWIN DELIVERY CENTRE"

**OUTPUT**

BARTON
DARWIN
DARWIN


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/