Regular expression help
am 19.11.2007 18:33:57 von Benedict White
I need a bit of help cleaning up a mess.
I need to write a regular expression that looks for invalid email
addresses. I know that the email addresses do not contain numerical
charictors or unusual combinations of letters.
What I was hoping for was something that would locate all emails with
say 2 before the at addressed to the domain I am looking after,
example.com.
I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
The numbers could be anywhere in string before the @. It will miss
email addresses with other numbers in them, but will also pick up any
without out. I need to have all with numbers in them, to the
example.com domain.
Then I want to extend it to look for odd combinations of letters, like
xb, which would then have to appear together but anywhere in the
string.
Kind regards
Benedict White
Re: Regular expression help
am 19.11.2007 19:21:47 von Ben Morrow
Quoth Benedict White :
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
It would be better to use a module that knows how to parse email
addresses.
It may be better to start with a list of valid address, and proceed from
there; however, this may not be possible.
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.
Something like
#!/usr/bin/perl
use warnings;
use strict;
use Email::Address;
my $domain = 'example.com';
my $invalid = qr/ \d | xb /x;
my @addrs = qw{
abc%
a@a
foo@bar.org
one2three@example.com
AAxbYY@example.com
user@example.com
};
sub result {
my ($reason) = @_;
warn "$reason\n";
no warnings 'exiting';
next ADDR;
}
ADDR: for my $addr (@addrs) {
my ($parsed) = Email::Address->parse($addr)
or result "invalid address: $addr";
$parsed->original eq $addr
or result "extra gunk around address in '$addr'";
$parsed->host eq $domain
or result "'$addr' not at '$domain'";
$parsed->user =~ $invalid
and result "'$addr' contains a forbidden string";
result "'$addr' is valid";
}
__END__
should work.
Ben
Re: Regular expression help
am 19.11.2007 20:18:45 von Jim Gibson
In article
<3bedd977-a9d6-45f3-a94d-68d9f2177c19@c29g2000hsa.googlegroups.com>,
Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
>
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.
This type of question has been asked frequently in the past. See
perldoc -q address
"How do I check a valid mail address?"
--
Jim Gibson
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Re: Regular expression help
am 20.11.2007 01:43:10 von Scott Bryce
Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses.
If I wanted to write code that determines if an email address is valid,
I would search CPAN for "email valid," and I would find this:
http://search.cpan.org/~rjbs/Email-Valid-0.179/lib/Email/Val id.pm
Re: Regular expression help
am 20.11.2007 02:20:48 von Benedict White
Many thanks.
Is there no simple regex for saying that a part of the text (the bit
before the @ in this case) can contain anything you like, as long as
it contains say the number 2?
Kind regards
Benedict White
Re: Regular expression help
am 20.11.2007 02:44:54 von Petr Vileta
Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
>
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.
>
If I understand you right this can solve it.
#!/usr/bin/perl
use strict;
my @mails = qw(abc@gmail.com abc2def@gmail.com abcxb@gmail.com
goodmail@gmail.com);
my $notneed = join('|', qw(2 xb));
foreach my $mail (@mails)
{
print "$mail - ", testmail($mail, $notneed),"\n";
}
sub testmail
{
my ($email, $regex)=@_;
return "bad" if($email =~ m/^[^\@]*($regex).*\@.+$/);
return "good";
}
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Re: Regular expression help
am 20.11.2007 03:38:42 von Tad McClellan
Benedict White wrote:
> Many thanks.
To who?
For what?
Please quote some context in followups.
> Is there no simple regex for saying that a part of the text (the bit
> before the @ in this case) can contain anything you like, as long as
> it contains say the number 2?
print "matched\n" if $text =~ /.*2.*\@/s;
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Re: Regular expression help
am 20.11.2007 04:09:21 von Gunnar Hjalmarsson
Benedict White wrote:
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors
/\d[^@]*@/
> or unusual combinations of letters.
> Then I want to extend it to look for odd combinations of letters, like
> xb, which would then have to appear together but anywhere in the
> string.
Seems like an unusual requirement... What if there is some Max Borke,
with the address maxborke@example.com ?
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Re: Regular expression help
am 20.11.2007 06:46:02 von jurgenex
Benedict White wrote:
> Is there no simple regex for saying that a part of the text (the bit
> before the @ in this case) can contain anything you like, as long as
> it contains say the number 2?
/.*2.*\@/
will do that.
jue
Re: Regular expression help
am 20.11.2007 13:44:39 von Paul Lalli
On Nov 20, 12:46 am, "Jürgen Exner" wrote:
> Benedict White wrote:
> > Is there no simple regex for saying that a part of the text (the bit
> > before the @ in this case) can contain anything you like, as long as
> > it contains say the number 2?
>
> /.*2.*\@/
>
> will do that.
Interesting that both you and Tad put the useless starting .* in your
regexp. It makes me wonder if it's not as useless as I think it is.
Is there any difference between that and
/2.*\@/
?
Paul Lalli
Re: Regular expression help
am 20.11.2007 16:58:21 von jurgenex
Paul Lalli wrote:
> On Nov 20, 12:46 am, "Jürgen Exner" wrote:
>> Benedict White wrote:
>>> Is there no simple regex for saying that a part of the text (the bit
>>> before the @ in this case) can contain anything you like, as long as
>>> it contains say the number 2?
>>
>> /.*2.*\@/
>>
>> will do that.
>
> Interesting that both you and Tad put the useless starting .* in your
> regexp. It makes me wonder if it's not as useless as I think it is.
> Is there any difference between that and
> /2.*\@/
Well, you are right. I can't see any reason to put it there, either.
jue
Re: Regular expression help
am 20.11.2007 17:20:01 von Josef Moellers
Jürgen Exner wrote:
> Paul Lalli wrote:
>> On Nov 20, 12:46 am, "Jürgen Exner" wrote:
>>> Benedict White wrote:
>>>> Is there no simple regex for saying that a part of the text (the bit
>>>> before the @ in this case) can contain anything you like, as long as
>>>> it contains say the number 2?
>>> /.*2.*\@/
>>>
>>> will do that.
>> Interesting that both you and Tad put the useless starting .* in your
>> regexp. It makes me wonder if it's not as useless as I think it is.
>> Is there any difference between that and
>> /2.*\@/
>
> Well, you are right. I can't see any reason to put it there, either.
I feel like carrying owls to Athens, but in principle there is a
difference between the two: In the former case (/.*2.*\@/), $PREMATCH
will be empty, in the latter case, (/2.*\@/) it won't.
--
Mails please to josef dot moellers
and I'm on gmx dot de.
Re: Regular expression help
am 20.11.2007 18:10:18 von Charlton Wilbur
>>>>> "PL" == Paul Lalli writes:
PL> Interesting that both you and Tad put the useless starting .*
PL> in your regexp. It makes me wonder if it's not as useless as
PL> I think it is. Is there any difference between [ /.*2.*\@/ ]
PL> and /2.*\@/ ?
To me, it indicates that the author of the regular expression is
thinking of the 2 as happening somewhere to the left of the @ sign in
the part of the string he cares about, as opposed to the 2 being at
the beginning of the part of the string he cares about.
Starting with the 2 is technically correct, and is only going to make
a difference if $& and company are involved somewhere.
Charlton
--
Charlton Wilbur
cwilbur@chromatico.net
Re: Regular expression help
am 21.11.2007 01:47:33 von Petr Vileta
Josef Moellers wrote:
> I feel like carrying owls to Athens, but in principle there is a
> difference between the two: In the former case (/.*2.*\@/), $PREMATCH
> will be empty, in the latter case, (/2.*\@/) it won't.
Please don't forget that the mail 2me@mydomain.com is _valid_ mail.
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Re: Regular expression help
am 21.11.2007 11:51:30 von Benedict White
On Nov 19, 5:33 pm, Benedict White wrote:
> I need a bit of help cleaning up a mess.
>
> I need to write a regular expression that looks for invalid email
> addresses. I know that the email addresses do not contain numerical
> charictors or unusual combinations of letters.
>
> What I was hoping for was something that would locate all emails with
> say 2 before the at addressed to the domain I am looking after,
> example.com.
>
> I tried ^[A-Za-z2._%+-]+...@example.com however the 2 is not required.
> The numbers could be anywhere in string before the @. It will miss
> email addresses with other numbers in them, but will also pick up any
> without out. I need to have all with numbers in them, to the
> example.com domain.
>
I seem to have found a regex that works:
[A-Za-Z]?[0-9][A-Za-Z]?@example.com.com
Which matches emails to the example.com domain containing numbers.
Kind regards
Benedict White
Re: Regular expression help
am 21.11.2007 23:45:58 von Tad McClellan
Benedict White wrote:
> I seem to have found a regex that works:
It has at least 4 different problems.
What does "works" mean when you say it?
> [A-Za-Z]?[0-9][A-Za-Z]?@example.com.com
>
> Which matches emails to the example.com domain containing numbers.
a-Z is not a valid range.
at-signs need to be escaped in double-quotish contexts such
as a pattern.
There are 2 ".com" substrings required by the pattern.
It only allows a single character between the digit and
the at-sign. It doens't match 'foo2bar@example.com.com' ...
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Re: Regular expression help
am 22.11.2007 11:12:38 von Benedict White
On Nov 21, 10:45 pm, Tad McClellan wrote:
> Benedict White wrote:
> > I seem to have found a regex that works:
>
> It has at least 4 different problems.
>
> What does "works" mean when you say it?
>
> > [A-Za-Z]?[0-9][A-Za-Z]?...@example.com.com
>
> > Which matches emails to the example.com domain containing numbers.
>
> a-Z is not a valid range.
>
> at-signs need to be escaped in double-quotish contexts such
> as a pattern.
>
> There are 2 ".com" substrings required by the pattern.
>
> It only allows a single character between the digit and
> the at-sign. It doens't match 'foo2...@example.com.com' ...
>
Oops, that was a typo. It should have read:
[A-Za-z]?[0-9][A-Za-z]?@example.com
Which does work with egrep without escaping the @.
However you are right it will only match 1 letter after the last
number before the att. Adding a * makes it two, but making it totally
wild matches everything :(.
Kind regards
Benedict White
Re: Regular expression help
am 23.11.2007 02:43:14 von Tad McClellan
Benedict White wrote:
> On Nov 21, 10:45 pm, Tad McClellan wrote:
>> Benedict White wrote:
>> > I seem to have found a regex that works:
>>
>> It has at least 4 different problems.
>>
>> What does "works" mean when you say it?
>>
>> > [A-Za-Z]?[0-9][A-Za-Z]?...@example.com.com
>>
>> > Which matches emails to the example.com domain containing numbers.
>>
>> a-Z is not a valid range.
>>
>> at-signs need to be escaped in double-quotish contexts such
>> as a pattern.
>>
>> There are 2 ".com" substrings required by the pattern.
>>
>> It only allows a single character between the digit and
>> the at-sign. It doens't match 'foo2...@example.com.com' ...
>>
>
>
> Oops, that was a typo.
Please copy/paste code rather than attempting to rekey it, and
introducing typos that are not in your actual code.
> It should have read:
>
> [A-Za-z]?[0-9][A-Za-z]?@example.com
>
> Which does work with egrep without escaping the @.
Silly me.
I thought we were talking Perl here in the Perl newsgroup.
> However you are right it will only match 1 letter after the last
> number before the att. Adding a * makes it two,
Adding an asterisk where?
Got code?
Did you mean to say "replacing the ? with *" like:
[A-Za-z]?[0-9][A-Za-z]*@example.com
??
That makes it makes it both less and more than two.
> but making it totally
> wild matches everything :(.
What does "totally wild" mean when you say it?
Got code?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"