How would I create a Regular Expression to check

How would I create a Regular Expression to check

am 03.01.2008 16:17:12 von Nathan

How would I create a Regular Expression to check Street address for
any of the below items:

If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O

If the first character is a B ...
BX
BOX
Buzon -- (Means 'Box' in Spanish)

If the first character is a A ...
Apartado -- (is 'PO Box in Spanish)
Aptdo -- (is POB abbreviated in Spanish)



Thanks,
Nathan

Re: How would I create a Regular Expression to check

am 03.01.2008 17:22:00 von Christian Winter

Nathan wrote:
> How would I create a Regular Expression to check Street address for
> any of the below items:
>
> If the first character is a P ...
> p.o. box
> po box
> po. box
> p.o box
> post office box
> POB
> POX
> PODRAWER
> POSTOFFICE
> PO BX
> POBOX
> P/O
>
> If the first character is a B ...
> BX
> BOX
> Buzon -- (Means 'Box' in Spanish)
>
> If the first character is a A ...
> Apartado -- (is 'PO Box in Spanish)
> Aptdo -- (is POB abbreviated in Spanish)

The short answer: you can't. At least not one single, reasonably
short regex that can cover it in one go. I'd simply iterate
over all the possibilities and compare each one to the street address,
like:

my @potokens = ("p.o. box", "po box", "po. box", "P/O", "etc.");
my $streetaddr = "po box 12345";

if( grep { $streetaddr =~ /^\Q$_\E/ ) @potokens )
{
print "Is a PO address!" . $/;
}

# or completely without regex:
foreach( @potokens )
{
if( substr( $streetaddr, 0, length($_) ) eq $_ )
{
print "Is a PO address!" . $/;
}
}

-Chris

Re: How would I create a Regular Expression to check

am 03.01.2008 20:01:29 von Ted Zlatanov

On Thu, 03 Jan 2008 17:22:00 +0100 Christian Winter wrote:

CW> Nathan wrote:
>> How would I create a Regular Expression to check Street address for
>> any of the below items:

CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:

An alternate approach is to use Parse::RecDescent. It's really good in
my experience for parsing this kind of disparate input, and will
organize it for you (so you can tell that the street adress was in
Spanish, for example).

Ted

Re: How would I create a Regular Expression to check

am 03.01.2008 23:26:26 von jjcassidy

On Jan 3, 10:17 am, Nathan wrote:
> How would I create a Regular Expression to check Street address for
> any of the below items:
>
> If the first character is a P ...
> p.o. box
> po box
> po. box
> p.o box
> post office box
> POB
> POX
> PODRAWER
> POSTOFFICE
> PO BX
> POBOX
> P/O
>
> If the first character is a B ...
> BX
> BOX
> Buzon -- (Means 'Box' in Spanish)
>
> If the first character is a A ...
> Apartado -- (is 'PO Box in Spanish)
> Aptdo -- (is POB abbreviated in Spanish)
>
> Thanks,
> Nathan

It feels like I'm doing your homework, but here:

(Ap(?>artado|tdo)|B(?>O?X|uzon)|p(?>\.?o\.?|ost office) box|P(?>\/O|
O(?:B|X|DRAWER|STOFFICE|[ ]BX|BOX))

It's just simple decomposition.

Re: How would I create a Regular Expression to check

am 07.01.2008 22:28:00 von Nathan

You did not do my homework but thanks... I will try yours as well...

Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])



On Jan 3, 5:26=A0pm, jjcass...@gmail.com wrote:
> On Jan 3, 10:17 am, Nathan wrote:
>
>
>
>
>
> > How would I create a Regular Expression to check Street address for
> > any of the below items:
>
> > If the first character is a P ...
> > p.o. box
> > po box
> > po. box
> > p.o box
> > post office box
> > POB
> > POX
> > PODRAWER
> > POSTOFFICE
> > PO BX
> > POBOX
> > P/O
>
> > If the first character is a B ...
> > BX
> > BOX
> > Buzon =A0 =A0 =A0-- (Means 'Box' in Spanish)
>
> > If the first character is a A ...
> > Apartado =A0 -- (is 'PO Box in Spanish)
> > Aptdo =A0 =A0 =A0-- (is POB abbreviated in Spanish)
>
> > Thanks,
> > Nathan
>
> It feels like I'm doing your homework, but here:
>
> (Ap(?>artado|tdo)|B(?>O?X|uzon)|p(?>\.?o\.?|ost office) box|P(?>\/O|
> O(?:B|X|DRAWER|STOFFICE|[ ]BX|BOX))
>
> It's just simple decomposition.- Hide quoted text -
>
> - Show quoted text -

Re: How would I create a Regular Expression to check

am 07.01.2008 22:50:32 von glex_no-spam

Nathan wrote:
> You did not do my homework but thanks... I will try yours as well...
>
> Here is what I came up with but I like yours better I might try yours
> instead of mine....
>
> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])

Ever hear of case-insensitive pattern matching?

perldoc perlop

Search for "m/PATTERN/cgimosx".

Re: How would I create a Regular Expression to check

am 07.01.2008 23:01:37 von Uri Guttman

>>>>> "JG" == J Gleixner writes:

JG> Nathan wrote:
>> You did not do my homework but thanks... I will try yours as well...
>> Here is what I came up with but I like yours better I might try yours
>> instead of mine....
>> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
>> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
>> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
>> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])

JG> Ever hear of case-insensitive pattern matching?

JG> perldoc perlop

beyond that, note the [.\s] which is just . with the /s modifier. and it
has * after it which may not be correct (or just slower than +). [/] is
noisy and will break it unless alternate delimiters are used. beyond
that it is impossible to read (and /i will help there). and the way the
words are jammed together makes no sense or is impossible to parse out
visually. altogether a most horrible regex. i will copy it for training
purposes. i don't expect its author to claim this is proprietary code
just out of embarrasment. :)

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: How would I create a Regular Expression to check

am 08.01.2008 09:12:40 von hjp-usenet2

On 2008-01-07 22:01, Uri Guttman wrote:
>>>>>> "JG" == J Gleixner writes:
>
> JG> Nathan wrote:
> >> You did not do my homework but thanks... I will try yours as well...
> >> Here is what I came up with but I like yours better I might try yours
> >> instead of mine....
> >> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
> >> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
> >> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
> >> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])
>
> JG> Ever hear of case-insensitive pattern matching?
>
> JG> perldoc perlop
>
> beyond that, note the [.\s] which is just . with the /s modifier.

What? A "." in a character class matches only a ".", But a \s still
matches any whitespace character, so [.\s] matches a "." or a whitespace
character. A /s modifier won't change its meaning.

hp

Re: How would I create a Regular Expression to check

am 08.01.2008 13:50:52 von jurgenex

[Please do not top-post, trying to correct]
Nathan wrote:
>> On Jan 3, 10:17 am, Nathan wrote:
>> > How would I create a Regular Expression to check Street address for
>> > any of the below items:
>>
>> > If the first character is a P ...
>> > p.o. box
>> > po box
>> > po. box
>> > p.o box
>> > post office box
>> > POB
>> > POX
>> > PODRAWER
>> > POSTOFFICE
>> > PO BX
>> > POBOX
>> > P/O
>>
>> > If the first character is a B ...
>> > BX
>> > BOX
>> > Buzon      -- (Means 'Box' in Spanish)
>>
>> > If the first character is a A ...
>> > Apartado   -- (is 'PO Box in Spanish)
>> > Aptdo      -- (is POB abbreviated in Spanish)
>Here is what I came up with but I like yours better I might try yours
>instead of mine....
>
>^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s] *[Bb][Oo]
>[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][ Oo][Ff][Ff]
>[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
>[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt ][Dd][Oo])

Sorry, but that's a great example for what not to do. Absolutely
unmaintainable. Within 4 weeks you will have no idea what that RE does and
how to modify it if you need to add another term.

IMO regular expressions are the wrong tool for the job. Far better would be
to put those terms in a hash (as keys), then extract the street name from
your address, and simply check if this street name exists() in the hash.
Or put the terms in an array and just loop through them.

Maybe that's not as smart as an RE approach, but it's much more intelligent.

jue

Re: How would I create a Regular Expression to check

am 08.01.2008 18:21:04 von Ted Zlatanov

On Mon, 7 Jan 2008 13:28:00 -0800 (PST) Nathan wrote:

N> You did not do my homework but thanks... I will try yours as well...
N> Here is what I came up with but I like yours better I might try yours
N> instead of mine....

N> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
N> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
N> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
N> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])

Good god, doesn't this bother you even a little bit? You should at
least submit it to the Daily WTF.

Ted

Re: How would I create a Regular Expression to check

am 31.01.2008 14:46:10 von dkcombs

In article <86lk76vily.fsf@lifelogs.com>,
Ted Zlatanov wrote:
>On Thu, 03 Jan 2008 17:22:00 +0100 Christian Winter wrote:
>
>CW> Nathan wrote:
>>> How would I create a Regular Expression to check Street address for
>>> any of the below items:
>
>CW> The short answer: you can't. At least not one single, reasonably
>CW> short regex that can cover it in one go. I'd simply iterate
>CW> over all the possibilities and compare each one to the street address,
>CW> like:
>
>An alternate approach is to use Parse::RecDescent. It's really good in
>my experience for parsing this kind of disparate input, and will
>organize it for you (so you can tell that the street adress was in
>Spanish, for example).
>
>Ted

A late response/request. *If* you find doing that pretty easy and
quick to do, *please* show us how you'd do it.

I've read the doc on it, and come away with neither facility nor understanding
for actually being able to use it in a real problem.

THANKS MUCH (from all of us?)

David

Re: How would I create a Regular Expression to check

am 31.01.2008 14:49:26 von dkcombs

In article <47829ea8$0$3575$815e3792@news.qwest.net>,
J. Gleixner wrote:
>Nathan wrote:
>> You did not do my homework but thanks... I will try yours as well...
>>
>> Here is what I came up with but I like yours better I might try yours
>> instead of mine....
>>
>> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
>> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
>> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
>> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])
>
>Ever hear of case-insensitive pattern matching?

Without first going to perlop, I ask: even in *character classes*?!
>
>perldoc perlop
>
>Search for "m/PATTERN/cgimosx".

david

Re: How would I create a Regular Expression to check

am 31.01.2008 15:38:53 von Gunnar Hjalmarsson

David Combs wrote:
> In article <47829ea8$0$3575$815e3792@news.qwest.net>,
> J. Gleixner wrote:
>> Nathan wrote:
>>>
>>> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]* [Bb][Oo]
>>> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][O o][Ff][Ff]
>>> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
>>> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt] [Dd][Oo])
>>
>> Ever hear of case-insensitive pattern matching?
>
> Without first going to perlop, I ask: even in *character classes*?!

You should have tried it instead of asking hundreds of people.

C:\home>type test.pl
$_ = 'abc';
print "Yes\n" if /[A-Z]/i;

C:\home>test.pl
Yes

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: How would I create a Regular Expression to check

am 31.01.2008 22:47:17 von Ted Zlatanov

On Thu, 31 Jan 2008 13:46:10 +0000 (UTC) dkcombs@panix.com (David Combs) wrote:

DC> In article <86lk76vily.fsf@lifelogs.com>,
DC> Ted Zlatanov wrote:
>> On Thu, 03 Jan 2008 17:22:00 +0100 Christian Winter wrote:
>>
CW> Nathan wrote:
>>>> How would I create a Regular Expression to check Street address for
>>>> any of the below items:
>>
CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:
>>
>> An alternate approach is to use Parse::RecDescent. It's really good in
>> my experience for parsing this kind of disparate input, and will
>> organize it for you (so you can tell that the street adress was in
>> Spanish, for example).
>>
>> Ted

DC> A late response/request. *If* you find doing that pretty easy and
DC> quick to do, *please* show us how you'd do it.

DC> I've read the doc on it, and come away with neither facility nor understanding
DC> for actually being able to use it in a real problem.

I wrote a tutorial on P::RD a while ago, and it should still be valid.
IBM dW seems to be down right this moment, use the Google cache if you
have to. I don't mention auto_tree, which is really handy if you want
to process the data yourself.

http://www.ibm.com/developerworks/library/l-perl-speak.html

Here's another good one (and many others will come up in a web search):

http://www.perl.com/pub/a/2001/06/13/recdecent.html

Are you asking specifically for the mailing address example originally
posted to be implemented in P::RD, or do you need more information on
how to use P::RD for your own applications? I can certainly give a
P::RD grammar for the full list of address rules, but it's tedious work
to implement every rule the OP wanted and I don't want to spend hours of
my time doing it just to prove it's easy.

Thanks
Ted