Regular expression to match only strings NOT containing particular words
Regular expression to match only strings NOT containing particular words
am 19.10.2007 07:00:28 von Dylan Nicholson
I can write a regular expression that will only match strings that are
NOT the word apple:
^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
But is there a neater way, and how would I do it to match strings that
are NOT the word apple OR banana? Then what would be needed to match
only strings that do not CONTAIN the word "apple" or "banana" or
"cherry"?
I'd love it if the following worked:
^[^(apple)(banana)(cherry)]*$
But it appears the parantheses are ignored, as
^[(apple)(banana)(cherry)]*$
simply matches any string that consists entire of the characters
a,b,c,e,h,l,n,r,p & y.
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 07:03:23 von jurgenex
Dylan Nicholson wrote:
> I can write a regular expression that will only match strings that are
> NOT the word apple:
>
> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> But is there a neater way, and how would I do it to match strings that
> are NOT the word apple OR banana? Then what would be needed to match
> only strings that do not CONTAIN the word "apple" or "banana" or
> "cherry"?
!(/apple/ or /banana/ or /cherry/)
jue
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 10:03:59 von Stephane CHAZELAS
2007-10-18, 22:00(-07), Dylan Nicholson:
> I can write a regular expression that will only match strings that are
> NOT the word apple:
>
> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> But is there a neater way, and how would I do it to match strings that
> are NOT the word apple OR banana? Then what would be needed to match
> only strings that do not CONTAIN the word "apple" or "banana" or
> "cherry"?
>
> I'd love it if the following worked:
>
> ^[^(apple)(banana)(cherry)]*$
>
> But it appears the parantheses are ignored, as
>
> ^[(apple)(banana)(cherry)]*$
>
> simply matches any string that consists entire of the characters
> a,b,c,e,h,l,n,r,p & y.
With perl regexps:
perl -ne 'print if /^(?:(?!apple|banana).)*$/'
or probably better:
perl -ne 'print if /^(?!.*(?:apple|banana))/'
But then, why not
perl -ne 'print if !/apple|banana/'
Note that vim's regexps have an equivalent negative look-ahead
operator.
--
Stéphane
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 12:11:03 von RAD
On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson
wrote:
>I can write a regular expression that will only match strings that are
>NOT the word apple:
>
>^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
>But is there a neater way, and how would I do it to match strings that
>are NOT the word apple OR banana? Then what would be needed to match
>only strings that do not CONTAIN the word "apple" or "banana" or
>"cherry"?
>
>I'd love it if the following worked:
>
>^[^(apple)(banana)(cherry)]*$
>
>But it appears the parantheses are ignored, as
>
>^[(apple)(banana)(cherry)]*$
>
>simply matches any string that consists entire of the characters
>a,b,c,e,h,l,n,r,p & y.
A simple way is to write the regex to match apple or banana or cherry,
do the match and then check the Success property of the match object.
Execute the following mini program
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Regex r = new Regex(".*apple|banana|cherry.*");
string[] strings =
"apple,banana,cherry,applebanana,applebananacherry,fishapple ,chips,chip
and apple,apple pie".Split(',');
foreach (string s in strings)
{
Console.WriteLine("{0} Match? {1}", s,
r.Match(s).Success);
}
Console.ReadLine();
}
}
}
You should get this:
apple Match? True
banana Match? True
cherry Match? True
applebanana Match? True
applebananacherry Match? True
fishapple Match? True
chips Match? False
chip and apple Match? True
apple pie Match? True
--
http://bytes.thinkersroom.com
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 13:11:29 von Michele Dondi
On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson
wrote:
>But is there a neater way, and how would I do it to match strings that
>are NOT the word apple OR banana? Then what would be needed to match
>only strings that do not CONTAIN the word "apple" or "banana" or
>"cherry"?
The general answer is that you should use separate regexen and logical
operators, or an explicit !~ but the subject of negating regexen is
discussed to some depth in the following thread @ PM:
http://perlmonks.org/?node_id=588315
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 16:28:38 von 1usa
Dylan Nicholson wrote in
news:1192770028.044399.164380@k35g2000prh.googlegroups.com:
[
newsgroup list trimmed, follow-ups set
There is no reason to cross-post to both c.l.p.misc and m.p.d.l.csharp
]
> I can write a regular expression that will only match strings that are
> NOT the word apple:
>
> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> But is there a neater way, and how would I do it to match strings that
> are NOT the word apple OR banana?
When you say "are not" rather than does not contain, it means you should
not be using regular expressions at all.
unless ( $s eq 'apple' or $s eq 'banana' or $s eq 'cherry' ) {
....
}
> Then what would be needed to match only strings that do not
> CONTAIN the word "apple" or "banana" or "cherry"?
unless (
index( $s, 'apple' ) > -1
index( $s, 'banana' ) > -1
index( $s, 'cherry' ) > -1
) {
....
}
If you have a long list of words, you could use
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw( first_index );
my $text = <
Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium,
totam rem aperiam, eaque ipsa quae ab illo
inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam
voluptatem quia voluptas sit aspernatur aut odit
aut fugit, sed quia consequuntur magni dolores eos
qui ratione voluptatem sequi nesciunt. Neque porro
quisquam est, qui dolorem ipsum quia dolor sit
amet, consectetur, adipisci velit, sed quia non
numquam eius modi tempora incidunt ut labore et
dolore magnam aliquam quaerat voluptatem. Ut enim
ad minima veniam, quis nostrum exercitationem
ullam corporis suscipit laboriosam, nisi ut
aliquid ex ea commodi consequatur? Quis autem vel
eum iure reprehenderit qui in ea voluptate velit
esse quam nihil molestiae consequatur, vel illum
qui dolorem eum fugiat quo voluptas nulla pariatur
EO_TEXT
my @wordlist = qw( hello explicabo reprehenderit random );
unless ( -1 == first_index { index( $text, $_ ) > -1 } @wordlist ) {
print "One of the words in the word list appears in the text.\n";
}
__END__
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
clpmisc guidelines:
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 16:33:32 von 1usa
"A. Sinan Unur" <1usa@llenroc.ude.invalid> wrote in
news:Xns99CE6A93E8341asu1cornelledu@127.0.0.1:
>> Then what would be needed to match only strings that do not
>> CONTAIN the word "apple" or "banana" or "cherry"?
>
> unless (
> index( $s, 'apple' ) > -1
> index( $s, 'banana' ) > -1
> index( $s, 'cherry' ) > -1
> ) {
Oooops.
unless (
index( $s, 'apple' ) > -1
or index( $s, 'banana' ) > -1
or index( $s, 'cherry' ) > -1
) {
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
clpmisc guidelines:
Re: Regular expression to match only strings NOT containing particular words
am 19.10.2007 18:40:25 von jurgenex
Jürgen Exner wrote:
> Dylan Nicholson wrote:
>> I can write a regular expression that will only match strings that
>> are NOT the word apple:
>>
>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>>
>> But is there a neater way, and how would I do it to match strings
>> that are NOT the word apple OR banana? Then what would be needed to
>> match only strings that do not CONTAIN the word "apple" or "banana"
>> or "cherry"?
>
> !(/apple/ or /banana/ or /cherry/)
Actually, coming to think of it: there is no good reason to use a RE in the
first place because you are looking for a literal substring only without any
of the meta-functionality of REs. The proper tool for that much simpler task
is index().
jue
Re: Regular expression to match only strings NOT containing particular words
am 26.10.2007 03:18:45 von Dylan Nicholson
On Oct 20, 2:40 am, "Jürgen Exner" wrote:
> Jürgen Exner wrote:
> > Dylan Nicholson wrote:
> >> I can write a regular expression that will only match strings that
> >> are NOT the word apple:
>
> >> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> >> But is there a neater way, and how would I do it to match strings
> >> that are NOT the word apple OR banana? Then what would be needed to
> >> match only strings that do not CONTAIN the word "apple" or "banana"
> >> or "cherry"?
>
> > !(/apple/ or /banana/ or /cherry/)
>
> Actually, coming to think of it: there is no good reason to use a RE in t=
he
> first place because you are looking for a literal substring only without =
any
> of the meta-functionality of REs. The proper tool for that much simpler t=
ask
> is index().
>
> jue
Sure, except the regular expression mechanism is already in place as a
feature of the application. I was just curious if it could be used to
solve a particular problem.
Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
Microsoft's .NET regex library.
Thanks anyway,
Dylan
Re: Regular expression to match only strings NOT containing particular words
am 26.10.2007 04:18:55 von jurgenex
Dylan Nicholson wrote:
> On Oct 20, 2:40 am, "Jürgen Exner" wrote:
>> Actually, coming to think of it: there is no good reason to use a RE
>> in the first place because you are looking for a literal substring
>> only without any of the meta-functionality of REs. The proper tool
>> for that much simpler task is index().
>
> Sure, except the regular expression mechanism is already in place as a
> feature of the application.
And index() is a function of native Perl itself.
> Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
> Microsoft's .NET regex library.
Well, it is native Perl code, no need for some .Net regex library.
jue
Re: Regular expression to match only strings NOT containing particular words
am 27.10.2007 21:45:49 von Jesse Houwing
Hello Dylan,
> On Oct 20, 2:40 am, "J?rgen Exner" wrote:
>
>> J?rgen Exner wrote:
>>
>>> Dylan Nicholson wrote:
>>>
>>>> I can write a regular expression that will only match strings that
>>>> are NOT the word apple:
>>>>
>>>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>>>>
>>>> But is there a neater way, and how would I do it to match strings
>>>> that are NOT the word apple OR banana? Then what would be needed to
>>>> match only strings that do not CONTAIN the word "apple" or "banana"
>>>> or "cherry"?
>>>>
>>> !(/apple/ or /banana/ or /cherry/)
>>>
>> Actually, coming to think of it: there is no good reason to use a RE
>> in the first place because you are looking for a literal substring
>> only without any of the meta-functionality of REs. The proper tool
>> for that much simpler task is index().
>>
>> jue
>>
> Sure, except the regular expression mechanism is already in place as a
> feature of the application. I was just curious if it could be used to
> solve a particular problem.
>
> Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
> Microsoft's .NET regex library.
It isn't ideal, but this will do the trick:
^((?!\b(cherry|banana|apple)\b).)*$
Make sure you set the option SingleLine and unset the option Multiline when
appropriate. If the application is under your control, it would probably
be easier to add a checkbox which will invert the match result from Success
to fail.
Though as Jue pointed out, it's probably faster and easier to maintain when
you implement a "bad words" list and use indexOf to see if the string is
in there somewhere. You might even use \bword\b in a regex for that.
--
Jesse Houwing
jesse.houwing at sogeti.nl
Re: Regular expression to match only strings NOT containing particular words
am 27.10.2007 23:04:53 von Dylan Nicholson
On Oct 28, 6:45 am, Jesse Houwing
wrote:
> Hello Dylan,
>
>
>
>
>
> > On Oct 20, 2:40 am, "J?rgen Exner" wrote:
>
> >> J?rgen Exner wrote:
>
> >>> Dylan Nicholson wrote:
>
> >>>> I can write a regular expression that will only match strings that
> >>>> are NOT the word apple:
>
> >>>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> >>>> But is there a neater way, and how would I do it to match strings
> >>>> that are NOT the word apple OR banana? Then what would be needed to
> >>>> match only strings that do not CONTAIN the word "apple" or "banana"
> >>>> or "cherry"?
>
> >>> !(/apple/ or /banana/ or /cherry/)
>
> >> Actually, coming to think of it: there is no good reason to use a RE
> >> in the first place because you are looking for a literal substring
> >> only without any of the meta-functionality of REs. The proper tool
> >> for that much simpler task is index().
>
> >> jue
>
> > Sure, except the regular expression mechanism is already in place as a
> > feature of the application. I was just curious if it could be used to
> > solve a particular problem.
>
> > Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
> > Microsoft's .NET regex library.
>
> It isn't ideal, but this will do the trick:
>
> ^((?!\b(cherry|banana|apple)\b).)*$
Thanks...works great...why do you say it's not ideal? I removed the
\b's though, as I need to exclude any string that contains "apple",
regardless of whether it's a separate word.
>
> Make sure you set the option SingleLine and unset the option Multiline when
> appropriate. If the application is under your control, it would probably
> be easier to add a checkbox which will invert the match result from Success
Yes, we'll probably do something similar for the next version.
>
> Though as Jue pointed out, it's probably faster and easier to maintain when
> you implement a "bad words" list and use indexOf to see if the string is
> in there somewhere. You might even use \bword\b in a regex for that.
>
If the regex does the job, it's more than adequate for now.
Re: Regular expression to match only strings NOT containing particular words
am 29.10.2007 01:02:42 von Jesse Houwing
Hello Dylan,
> On Oct 28, 6:45 am, Jesse Houwing
> wrote:
>
>> Hello Dylan,
>>
>>> On Oct 20, 2:40 am, "J?rgen Exner" wrote:
>>>
>>>> J?rgen Exner wrote:
>>>>
>>>>> Dylan Nicholson wrote:
>>>>>
>>>>>> I can write a regular expression that will only match strings
>>>>>> that are NOT the word apple:
>>>>>>
>>>>>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>>>>>>
>>>>>> But is there a neater way, and how would I do it to match strings
>>>>>> that are NOT the word apple OR banana? Then what would be needed
>>>>>> to match only strings that do not CONTAIN the word "apple" or
>>>>>> "banana" or "cherry"?
>>>>>>
>>>>> !(/apple/ or /banana/ or /cherry/)
>>>>>
>>>> Actually, coming to think of it: there is no good reason to use a
>>>> RE in the first place because you are looking for a literal
>>>> substring only without any of the meta-functionality of REs. The
>>>> proper tool for that much simpler task is index().
>>>>
>>>> jue
>>>>
>>> Sure, except the regular expression mechanism is already in place as
>>> a feature of the application. I was just curious if it could be
>>> used to solve a particular problem.
>>>
>>> Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
>>> Microsoft's .NET regex library.
>>>
>> It isn't ideal, but this will do the trick:
>>
>> ^((?!\b(cherry|banana|apple)\b).)*$
>>
> Thanks...works great...why do you say it's not ideal?
My guess is that is isn't the fastest solution.
> I removed the
> \b's though, as I need to exclude any string that contains "apple",
> regardless of whether it's a separate word.
Ok, didn't understand that from the original post. You can then also remove
the addiotional ()
^((?!cherry|banana|apple).)*$
>> Make sure you set the option SingleLine and unset the option
>> Multiline when
>> appropriate. If the application is under your control, it would
>> probably
>> be easier to add a checkbox which will invert the match result from
>> Success
> Yes, we'll probably do something similar for the next version.
>
>> Though as Jue pointed out, it's probably faster and easier to
>> maintain when
>> you implement a "bad words" list and use indexOf to see if the string
>> is
>> in there somewhere. You might even use \bword\b in a regex for that.
> If the regex does the job, it's more than adequate for now.
Good. Glad I was of help.
--
Jesse Houwing
jesse.houwing at sogeti.nl