Regular expression help

Regular expression help

am 19.11.2007 18:33:57 von Benedict White

I need a bit of help cleaning up a mess.

I need to write a regular expression that looks for invalid email
addresses. I know that the email addresses do not contain numerical
charictors or unusual combinations of letters.

What I was hoping for was something that would locate all emails with
say 2 before the at addressed to the domain I am looking after,
example.com.

I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
The numbers could be anywhere in string before the @. It will miss
email addresses with other numbers in them, but will also pick up any
without out. I need to have all with numbers in them, to the
example.com domain.

Then I want to extend it to look for odd combinations of letters, like
xb, which would then have to appear together but anywhere in the
string.

Kind regards


Benedict White

Re: Regular expression help

am 19.11.2007 19:21:47 von Ben Morrow

Quoth Benedict White :
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.

It would be better to use a module that knows how to parse email
addresses.

It may be better to start with a list of valid address, and proceed from
there; however, this may not be possible.

> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.

Something like

#!/usr/bin/perl

use warnings;
use strict;

use Email::Address;

my $domain = 'example.com';

my $invalid = qr/ \d | xb /x;

my @addrs = qw{
abc%
a@a
foo@bar.org
one2three@example.com
AAxbYY@example.com
user@example.com
};

sub result {
my ($reason) = @_;
warn "$reason\n";
no warnings 'exiting';
next ADDR;
}

ADDR: for my $addr (@addrs) {
my ($parsed) = Email::Address->parse($addr)
or result "invalid address: $addr";

$parsed->original eq $addr
or result "extra gunk around address in '$addr'";

$parsed->host eq $domain
or result "'$addr' not at '$domain'";

$parsed->user =~ $invalid
and result "'$addr' contains a forbidden string";

result "'$addr' is valid";
}

__END__

should work.

Ben

Re: Regular expression help

am 19.11.2007 20:18:45 von Jim Gibson

In article
<3bedd977-a9d6-45f3-a94d-68d9f2177c19@c29g2000hsa.googlegroups.com>,
Benedict White wrote:

> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
>
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.

This type of question has been asked frequently in the past. See

perldoc -q address

"How do I check a valid mail address?"

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Re: Regular expression help

am 20.11.2007 01:43:10 von Scott Bryce

Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses.

If I wanted to write code that determines if an email address is valid,
I would search CPAN for "email valid," and I would find this:

http://search.cpan.org/~rjbs/Email-Valid-0.179/lib/Email/Val id.pm

Re: Regular expression help

am 20.11.2007 02:20:48 von Benedict White

Many thanks.

Is there no simple regex for saying that a part of the text (the bit
before the @ in this case) can contain anything you like, as long as
it contains say the number 2?

Kind regards


Benedict White

Re: Regular expression help

am 20.11.2007 02:44:54 von Petr Vileta

Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
>
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.
>
If I understand you right this can solve it.

#!/usr/bin/perl
use strict;

my @mails = qw(abc@gmail.com abc2def@gmail.com abcxb@gmail.com
goodmail@gmail.com);
my $notneed = join('|', qw(2 xb));
foreach my $mail (@mails)
{
print "$mail - ", testmail($mail, $notneed),"\n";
}

sub testmail
{
my ($email, $regex)=@_;
return "bad" if($email =~ m/^[^\@]*($regex).*\@.+$/);
return "good";
}


--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)

Re: Regular expression help

am 20.11.2007 03:38:42 von Tad McClellan

Benedict White wrote:

> Many thanks.


To who?

For what?

Please quote some context in followups.


> Is there no simple regex for saying that a part of the text (the bit
> before the @ in this case) can contain anything you like, as long as
> it contains say the number 2?


print "matched\n" if $text =~ /.*2.*\@/s;


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Regular expression help

am 20.11.2007 04:09:21 von Gunnar Hjalmarsson

Benedict White wrote:
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors

/\d[^@]*@/

> or unusual combinations of letters.



> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.

Seems like an unusual requirement... What if there is some Max Borke,
with the address maxborke@example.com ?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Regular expression help

am 20.11.2007 06:46:02 von jurgenex

Benedict White wrote:

> Is there no simple regex for saying that a part of the text (the bit
> before the @ in this case) can contain anything you like, as long as
> it contains say the number 2?

/.*2.*\@/

will do that.

jue

Re: Regular expression help

am 20.11.2007 13:44:39 von Paul Lalli

On Nov 20, 12:46 am, "Jürgen Exner" wrote:
> Benedict White wrote:
> > Is there no simple regex for saying that a part of the text (the bit
> > before the @ in this case) can contain anything you like, as long as
> > it contains say the number 2?
>
> /.*2.*\@/
>
> will do that.

Interesting that both you and Tad put the useless starting .* in your
regexp. It makes me wonder if it's not as useless as I think it is.
Is there any difference between that and
/2.*\@/
?

Paul Lalli

Re: Regular expression help

am 20.11.2007 16:58:21 von jurgenex

Paul Lalli wrote:
> On Nov 20, 12:46 am, "Jürgen Exner" wrote:
>> Benedict White wrote:
>>> Is there no simple regex for saying that a part of the text (the bit
>>> before the @ in this case) can contain anything you like, as long as
>>> it contains say the number 2?
>>
>> /.*2.*\@/
>>
>> will do that.
>
> Interesting that both you and Tad put the useless starting .* in your
> regexp. It makes me wonder if it's not as useless as I think it is.
> Is there any difference between that and
> /2.*\@/

Well, you are right. I can't see any reason to put it there, either.

jue

Re: Regular expression help

am 20.11.2007 17:20:01 von Josef Moellers

Jürgen Exner wrote:
> Paul Lalli wrote:
>> On Nov 20, 12:46 am, "Jürgen Exner" wrote:
>>> Benedict White wrote:
>>>> Is there no simple regex for saying that a part of the text (the bit
>>>> before the @ in this case) can contain anything you like, as long as
>>>> it contains say the number 2?
>>> /.*2.*\@/
>>>
>>> will do that.
>> Interesting that both you and Tad put the useless starting .* in your
>> regexp. It makes me wonder if it's not as useless as I think it is.
>> Is there any difference between that and
>> /2.*\@/
>
> Well, you are right. I can't see any reason to put it there, either.

I feel like carrying owls to Athens, but in principle there is a
difference between the two: In the former case (/.*2.*\@/), $PREMATCH
will be empty, in the latter case, (/2.*\@/) it won't.

--
Mails please to josef dot moellers
and I'm on gmx dot de.

Re: Regular expression help

am 20.11.2007 18:10:18 von Charlton Wilbur

>>>>> "PL" == Paul Lalli writes:

PL> Interesting that both you and Tad put the useless starting .*
PL> in your regexp. It makes me wonder if it's not as useless as
PL> I think it is. Is there any difference between [ /.*2.*\@/ ]
PL> and /2.*\@/ ?

To me, it indicates that the author of the regular expression is
thinking of the 2 as happening somewhere to the left of the @ sign in
the part of the string he cares about, as opposed to the 2 being at
the beginning of the part of the string he cares about.

Starting with the 2 is technically correct, and is only going to make
a difference if $& and company are involved somewhere.

Charlton


--
Charlton Wilbur
cwilbur@chromatico.net

Re: Regular expression help

am 21.11.2007 01:47:33 von Petr Vileta

Josef Moellers wrote:
> I feel like carrying owls to Athens, but in principle there is a
> difference between the two: In the former case (/.*2.*\@/), $PREMATCH
> will be empty, in the latter case, (/2.*\@/) it won't.

Please don't forget that the mail 2me@mydomain.com is _valid_ mail.
--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)

Re: Regular expression help

am 21.11.2007 11:51:30 von Benedict White

On Nov 19, 5:33 pm, Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
>
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+...@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>

I seem to have found a regex that works:
[A-Za-Z]?[0-9][A-Za-Z]?@example.com.com

Which matches emails to the example.com domain containing numbers.

Kind regards


Benedict White

Re: Regular expression help

am 21.11.2007 23:45:58 von Tad McClellan

Benedict White wrote:

> I seem to have found a regex that works:


It has at least 4 different problems.

What does "works" mean when you say it?


> [A-Za-Z]?[0-9][A-Za-Z]?@example.com.com
>
> Which matches emails to the example.com domain containing numbers.


a-Z is not a valid range.

at-signs need to be escaped in double-quotish contexts such
as a pattern.

There are 2 ".com" substrings required by the pattern.

It only allows a single character between the digit and
the at-sign. It doens't match 'foo2bar@example.com.com' ...


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Regular expression help

am 22.11.2007 11:12:38 von Benedict White

On Nov 21, 10:45 pm, Tad McClellan wrote:
> Benedict White wrote:
> > I seem to have found a regex that works:
>
> It has at least 4 different problems.
>
> What does "works" mean when you say it?
>
> > [A-Za-Z]?[0-9][A-Za-Z]?...@example.com.com
>
> > Which matches emails to the example.com domain containing numbers.
>
> a-Z is not a valid range.
>
> at-signs need to be escaped in double-quotish contexts such
> as a pattern.
>
> There are 2 ".com" substrings required by the pattern.
>
> It only allows a single character between the digit and
> the at-sign. It doens't match 'foo2...@example.com.com' ...
>


Oops, that was a typo. It should have read:

[A-Za-z]?[0-9][A-Za-z]?@example.com

Which does work with egrep without escaping the @.


However you are right it will only match 1 letter after the last
number before the att. Adding a * makes it two, but making it totally
wild matches everything :(.




Kind regards

Benedict White

Re: Regular expression help

am 23.11.2007 02:43:14 von Tad McClellan

Benedict White wrote:
> On Nov 21, 10:45 pm, Tad McClellan wrote:
>> Benedict White wrote:
>> > I seem to have found a regex that works:
>>
>> It has at least 4 different problems.
>>
>> What does "works" mean when you say it?
>>
>> > [A-Za-Z]?[0-9][A-Za-Z]?...@example.com.com
>>
>> > Which matches emails to the example.com domain containing numbers.
>>
>> a-Z is not a valid range.
>>
>> at-signs need to be escaped in double-quotish contexts such
>> as a pattern.
>>
>> There are 2 ".com" substrings required by the pattern.
>>
>> It only allows a single character between the digit and
>> the at-sign. It doens't match 'foo2...@example.com.com' ...
>>
>
>
> Oops, that was a typo.


Please copy/paste code rather than attempting to rekey it, and
introducing typos that are not in your actual code.


> It should have read:
>
> [A-Za-z]?[0-9][A-Za-z]?@example.com
>
> Which does work with egrep without escaping the @.


Silly me.

I thought we were talking Perl here in the Perl newsgroup.


> However you are right it will only match 1 letter after the last
> number before the att. Adding a * makes it two,


Adding an asterisk where?

Got code?

Did you mean to say "replacing the ? with *" like:

[A-Za-z]?[0-9][A-Za-z]*@example.com

??

That makes it makes it both less and more than two.


> but making it totally
> wild matches everything :(.


What does "totally wild" mean when you say it?

Got code?


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"