Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries



Links

Issociate
Impressum

#1: regular expression negate a word (not character)

Posted on 2008-01-26 02:16:31 by Summercoolness

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression

Report this message

Mr Ad

Google

#2: Re: regular expression negate a word (not character)

Posted on 2008-01-26 03:15:36 by Summercoolness

On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

Report this message

#3: Re: regular expression negate a word (not character)

Posted on 2008-01-26 03:42:19 by paulaireilly

On Jan 25, 8:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
....
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression

You might be looking for a <b>negative lookahead assertion</b>. Look
that up in a handy
source. The syntax is approximately


(?!foo) --> will match at any place "betwee" chars not immediately
preceded by a"foo".

Now, you have to add in the "bar" afterwards. But remember that

(?!foo) takes up zero width. And be careful about /.*/ matching
anything including zero
chars.

I would say more but this looks sorta like a homework assignment to
me. So this is a "hint" post. I or someone else could do a "solution"
post later, but just having the phrase
"negative lookahead assertion" to look up on the Web or in a book
index will probably
answer all your questions.

Note that what you *really* want is a "negative lookbehind assertion",
to put right in front of the "tire" in your example (or my "bar"), but
I think those won't be working until Perl 6.

Report this message

#4: Re: regular expression negate a word (not character)

Posted on 2008-01-26 04:27:31 by Ben Morrow

Quoth paulaireilly <paulaireilly@gmail.com>:
>
> You might be looking for a <b>negative lookahead assertion</b>. Look
<snip>
>
> Note that what you *really* want is a "negative lookbehind assertion",
> to put right in front of the "tire" in your example (or my "bar"), but
> I think those won't be working until Perl 6.

No, they work perfectly well in Perl 5, at least for fixed-length
strings. Syntax is (?<= ) and (?<! ). In 5.10 you can get
variable-length positive (but not negative) lookbehind at the start of
the match using \K.

Ben

Report this message

#5: Re: regular expression negate a word (not character)

Posted on 2008-01-26 04:37:53 by Ben Morrow

[newsgroups line fixed, f'ups set to clpm]

Quoth Summercool <Summercoolness@gmail.com>:
> On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> > somebody who is a regular expression guru... how do you negate a word
> > and grep for all words that is
> >
> > tire
> >
> > but not
> >
> > snow tire
> >
> > or
> >
> > snowtire
>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?

This is no good, since 'snoo tire' fails to match even though you want
it to. You need something more like

/ (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

but that gets *really* tedious for long strings, unless you generate it.

Ben

Report this message

#6: Re: regular expression negate a word (not character)

Posted on 2008-01-26 05:40:23 by Mark Tolonen

"Summercool" <Summercoolness@gmail.com> wrote in message
news:27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.google groups.com...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression
>

What you want is a negative lookbehind assertion:

>>> re.search(r'(?<!snow)tire','snowtire') # no match
>>> re.search(r'(?<!snow)tire','baldtire')
<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

>>> re.search(r'(?<!snow\s*)tire','snow tire')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>

Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:

>>> re.search(r'(?<!snow)\s*tire','snow tire')
<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
if not re.match(notpattern,matchobj.group()):
yield matchobj

def markexcept(pattern,notpattern,string):
substrings = []
current = 0

for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###

>>> sample='''winter tire
.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''
>>> print markexcept('tire','snow\s*tire',sample)
winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark

Report this message

#7: Re: regular expression negate a word (not character)

Posted on 2008-01-26 10:53:25 by Summercoolness

to add to the test cases, the regular expression must be able to grep


snowbird tire
tired on a snow day
snow tire and regular tire

Report this message

#8: Re: regular expression negate a word (not character)

Posted on 2008-01-26 11:46:59 by bearophileHUGS

Summercool:
> to add to the test cases, the regular expression must be able to grep
> snow tire and regular tire

I presume there only the second tire has to be found.

This is my first try:

text = """
tire
word tire word
word retire word
word tired word
snowbird tire word
tired on a snow day word
snow tire and regular tire word
word snow tire word
word snow tire word
word some snowtires word
"""

import re

def finder(text):
patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
for mo in patt.finditer(text):
if not mo.group(1).endswith("snow"):
yield mo.start(2)

for end in finder(text):
print end

The (lazy) output is the starting point of the "tire" that match:


1
11
28
43
63
73
120

Bye,
bearophile

Report this message

#9: Re: regular expression negate a word (not character)

Posted on 2008-01-26 12:34:16 by Paddy

On Jan 26, 1:16 am, Summercool <Summercooln...@gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression

Try the answer here:
http://mail.python.org/pipermail/tutor/2003-August/024902.ht ml

Report this message

#10: Re: regular expression negate a word (not character)

Posted on 2008-01-26 12:53:42 by bearophileHUGS

Paddy:
> Try the answer here:
> http://mail.python.org/pipermail/tutor/2003-August/024902.ht ml

But in the OP problem there can be variable-sized spaces in the
middle...

Bye,
bearophile

Report this message

#11: Re: regular expression negate a word (not character)

Posted on 2008-01-26 22:39:02 by Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Summercool
<Summercoolness@gmail.com>], who wrote in article <27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.googlegroups.com>:
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires

This does not describe the problem completely. What about

thisnow tire
snow; tire

etc? Anyway, one of the obvious modifications of

(^ | \b(?!snow) \w+ ) \W* tire

should work.

Hope this helps,
Ilya

Report this message

#12: Re: regular expression negate a word (not character)

Posted on 2008-01-28 19:53:42 by gbacon

The code below at least passes your tests.

Hope it helps,
Greg

#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my @tests = (
[ "winter tire", => MATCH ],
[ "tire", => MATCH ],
[ "retire", => MATCH ],
[ "tired", => MATCH ],
[ "snowbird tire", => MATCH ],
[ "tired on a snow day", => MATCH ],
[ "snow tire and regular tire", => MATCH ],
[ " tire" => MATCH ],
[ "snow tire" => NO_MATCH ],
[ "snow tire" => NO_MATCH ],
[ "some snowtires" => NO_MATCH ],
);

my $not_snow_tire = qr/
^ \s* tire |
([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
/xi;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str =~ /$not_snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
... all these cries of having 'abolished slavery,' of having 'preserved the
union,' of establishing a 'government by consent,' and of 'maintaining the
national honor' are all gross, shameless, transparent cheats -- so trans-
parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"

Report this message

#13: Re: regular expression negate a word (not character)

Posted on 2008-01-28 21:00:55 by rvtol+news

Greg Bacon schreef:

> #! /usr/bin/perl
>
> use warnings;
> use strict;
>
> use constant {
> MATCH => 1,
> NO_MATCH => 0,
> };
>
> my @tests = (
> [ "winter tire", => MATCH ],
> [ "tire", => MATCH ],
> [ "retire", => MATCH ],
> [ "tired", => MATCH ],
> [ "snowbird tire", => MATCH ],
> [ "tired on a snow day", => MATCH ],
> [ "snow tire and regular tire", => MATCH ],
> [ " tire" => MATCH ],
> [ "snow tire" => NO_MATCH ],
> [ "snow tire" => NO_MATCH ],
> [ "some snowtires" => NO_MATCH ],
> );
> [...]

I negated the test, to make the regex simpler:

my $snow_tire = qr/
snow [[:blank:]]* tire (?!.*tire)
/x;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str !~ /$snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
Affijn, Ruud

"Gewoon is een tijger."

Report this message

#14: Re: regular expression negate a word (not character)

Posted on 2008-01-28 22:37:36 by Paul McGuire

On Jan 25, 7:16=A0pm, Summercool <Summercooln...@gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> =A0 tire
>
> but not
>
> =A0 snow tire
>
> or
>
> =A0 snowtire
>

Too bad pyparsing's not an option. Here's what it would look like:

data =3D """
Match:
> winter tire
> tire
> retire
> tired

But not match:
> snow tire
> snow tire
> some snowtires

snowbird tire
tired on a snow day
snow tire and regular tire

"""

from pyparsing import CaselessLiteral,Literal,line

# caseless wasn't really necessary but you never know
# when you'll run into a "Snow tire"
snow =3D CaselessLiteral("snow")
tire =3D Literal("tire")
tire.ignore(snow + tire)

for matchTokens,matchStart,matchEnd in tire.scanString(data):
print line(matchStart, data)


Prints:

> winter tire
> tire
> retire
> tired
snowbird tire
tired on a snow day
snow tire and regular tire

-- Paul

Report this message

#15: Re: regular expression negate a word (not character)

Posted on 2008-01-29 18:12:09 by gbacon

In article <fnlfr0.1fk.1@news.isolution.nl>,
Dr.Ruud <rvtol+news@isolution.nl> wrote:

: I negated the test, to make the regex simpler: [...]

Yes, your approach is simpler. I assumed from the "need it all
in one pattern" constraint that the OP is feeding the regular
expression to some other program that is looking for matches.

I dunno. Maybe it was the familiar compulsion with Perl to
attempt to cram everything into a single pattern.

Greg
--
What light is to the eyes -- what air is to the lungs -- what love is to
the heart, liberty is to the soul of man.
-- Robert Green Ingersoll

Report this message

#16: Re: regular expression negate a word (not character)

Posted on 2008-02-01 11:36:11 by rvtol+news

Greg Bacon schreef:
> Dr.Ruud:

>> I negated the test, to make the regex simpler: [...]
>
> Yes, your approach is simpler. I assumed from the "need it all
> in one pattern" constraint that the OP is feeding the regular
> expression to some other program that is looking for matches.

Yes, I assumed about the same, but thought it would be a nice
alternative anyways.
Happy Perling!

--
Affijn, Ruud

"Gewoon is een tijger."

Report this message