RegEx, how to seperate word from digits?

RegEx, how to seperate word from digits?

am 23.01.2008 21:21:29 von kirknew2Reg

I have a column that contains "suite 111" and "suite222"I need a $
variable containing the word part and aother $ variable containing the
digit part. I have tried variations on this syntax:
(\w*)(\d*)(.*)
(\w*)(\s?)(\d*)(.*)
But nothig I have tried seperates the word from the digits when there
is no space. How do i get 'suite222' to brake in to seperate
variables?

Re: RegEx, how to seperate word from digits?

am 23.01.2008 21:30:37 von smallpond

On Jan 23, 3:21 pm, kirknew2Reg wrote:
> I have a column that contains "suite 111" and "suite222"I need a $
> variable containing the word part and aother $ variable containing the
> digit part. I have tried variations on this syntax:
> (\w*)(\d*)(.*)
> (\w*)(\s?)(\d*)(.*)
> But nothig I have tried seperates the word from the digits when there
> is no space. How do i get 'suite222' to brake in to seperate
> variables?


How about: /([[:alpha:]]*)\s*(\d*)/

Re: RegEx, how to seperate word from digits?

am 23.01.2008 22:12:39 von Florian Kaufmann

Similar to smallpond's answer, however enforces that the word- and the
digitpart are at least 1 character long. Else, also things like "3"
"x" or even the empty string "" are found.

my ($word,$digit) = /([[:alpha:]]+)\s*(\d+)/;
# do something with $word and $digit

Re: RegEx, how to seperate word from digits?

am 23.01.2008 22:53:56 von kirknew2Reg

On Jan 23, 1:12=A0pm, Florian Kaufmann wrote:
> Similar to smallpond's answer, however enforces that the word- and the
> digitpart are at least 1 character long. Else, also things like "3"
> "x" or even the empty string "" are found.
>
> my ($word,$digit) =3D /([[:alpha:]]+)\s*(\d+)/;
> # do something with $word and $digit

Thanks for the help. the solution worked.

Re: RegEx, how to seperate word from digits?

am 23.01.2008 23:30:31 von gbacon

kirknew2Reg wrote:

: I have a column that contains "suite 111" and "suite222"I need a $
: variable containing the word part and aother $ variable containing
: the digit part. I have tried variations on this syntax:
:
: (\w*)(\d*)(.*)
: (\w*)(\s?)(\d*)(.*)
:
: But nothig I have tried seperates the word from the digits when
: there is no space. How do i get 'suite222' to brake in to seperate
: variables?

Another handy trick is the double-negative:

$ cat try
#! /usr/bin/perl

for ("suite 111", "suite222") {
if (/^([^\W\d]+)\s*(\d+)$/) {
print "$_: $1 - $2\n";
}
else {
print "$_: no match\n";
}
}

$ ./try
suite 111: suite - 111
suite222: suite - 222

It's easy to forget that \w matches both alphabetic characters
and numeric characters. (Don't forget about the poor underscore!)

Written out longhand, the pattern [^\W\d] is

NOT [ (NOT a word character) OR (a digit) ]

Apply DeMorgan's theorem to see that this is equivalent to "a
word character that isn't a digit."

Maybe our English teachers didn't know so much after all!

Hope this helps,
Greg
--
What light is to the eyes -- what air is to the lungs -- what love is to
the heart, liberty is to the soul of man.
-- Robert Green Ingersoll

Re: RegEx, how to seperate word from digits?

am 24.01.2008 00:32:32 von Abigail

_
Greg Bacon (gbacon@hiwaay.net) wrote on VCCLVIII September MCMXCIII in
:
;; kirknew2Reg wrote:
;;
;; : I have a column that contains "suite 111" and "suite222"I need a $
;; : variable containing the word part and aother $ variable containing
;; : the digit part. I have tried variations on this syntax:
;; :
;; : (\w*)(\d*)(.*)
;; : (\w*)(\s?)(\d*)(.*)
;; :
;; : But nothig I have tried seperates the word from the digits when
;; : there is no space. How do i get 'suite222' to brake in to seperate
;; : variables?
;;
;; Another handy trick is the double-negative:
;;
;; $ cat try
;; #! /usr/bin/perl
;;
;; for ("suite 111", "suite222") {
;; if (/^([^\W\d]+)\s*(\d+)$/) {
;; print "$_: $1 - $2\n";
;; }
;; else {
;; print "$_: no match\n";
;; }
;; }
;;
;; $ ./try
;; suite 111: suite - 111
;; suite222: suite - 222
;;
;; It's easy to forget that \w matches both alphabetic characters
;; and numeric characters. (Don't forget about the poor underscore!)

Why so complicated with a double negative? Matching letters is easy:
\pL will match a letter. \d matches a digit. So, I'd use:

/(?\pL+) \s* (?\d+)/x;

Example code:

for ("suite 111", "suite222") {
if (/(?\pL+) \s* (?\d+)/x) {
say $+ {word}, " -- ", $+ {number};
}
}

__END__
suite -- 111
suite -- 222



Abigail
--
perl -we 'print q{print q{print q{print q{print q{print q{print q{print q{print
qq{Just Another Perl Hacker\n}}}}}}}}}' |\
perl -w | perl -w | perl -w | perl -w | perl -w | perl -w | perl -w | perl -w

Re: RegEx, how to seperate word from digits?

am 24.01.2008 11:28:07 von rvtol+news

smallpond schreef:
> kirknew2Reg:

>> I have a column that contains "suite 111" and "suite222"I need a $
>> variable containing the word part and aother $ variable containing
>> the digit part. I have tried variations on this syntax:
>> (\w*)(\d*)(.*)
>> (\w*)(\s?)(\d*)(.*)
>> But nothig I have tried seperates the word from the digits when
>> there is no space. How do i get 'suite222' to brake in to seperate
>> variables?
>
> How about: /([[:alpha:]]*)\s*(\d*)/

To keep up the POSIX-style:

/([[:alpha:]]+)[[:blank:]]*([[:digit:]]+)/

And [[:blank:]] contains less characters that \s.

And [[:alpha:]] can of course be obscured as [^\W\d_].

--
Affijn, Ruud

"Gewoon is een tijger."

Re: RegEx, how to seperate word from digits?

am 24.01.2008 16:41:06 von Ted Zlatanov

On Thu, 24 Jan 2008 11:28:07 +0100 "Dr.Ruud" wrote:

R> smallpond schreef:
>> kirknew2Reg:

>>> I have a column that contains "suite 111" and "suite222"I need a $
>>> variable containing the word part and aother $ variable containing
>>> the digit part. I have tried variations on this syntax:
>>> (\w*)(\d*)(.*)
>>> (\w*)(\s?)(\d*)(.*)
>>> But nothig I have tried seperates the word from the digits when
>>> there is no space. How do i get 'suite222' to brake in to seperate
>>> variables?
>>
>> How about: /([[:alpha:]]*)\s*(\d*)/

R> To keep up the POSIX-style:

R> /([[:alpha:]]+)[[:blank:]]*([[:digit:]]+)/

R> And [[:blank:]] contains less characters that \s.

R> And [[:alpha:]] can of course be obscured as [^\W\d_].

Just make the \w match non-greedy. The last test case below is
questionable, but the OP didn't specify what to do in that case.

There's also the "match against the reversed string and reverse the
matches" approach :)

Ted

for ("suite 111", "suite222", " 111", "222")
{
if (/^(\w+?)\s*(\d+)$/) # match \w characters conservatively
{
print "$_: $1 - $2\n";
}
else
{
print "$_: no match\n";
}
}

-->
suite 111: suite - 111
suite222: suite - 222
111: no match
222: 2 - 22