FAQ 6.16 How do I efficiently match many regular expressions at once?
FAQ 6.16 How do I efficiently match many regular expressions at once?
am 14.08.2007 03:03:02 von PerlFAQ Server
This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
------------------------------------------------------------ --------
6.16: How do I efficiently match many regular expressions at once?
( contributed by brian d foy )
Avoid asking Perl to compile a regular expression every time you want to
match it. In this example, perl must recompile the regular expression
for every iteration of the foreach() loop since it has no way to know
what $pattern will be.
@patterns = qw( foo bar baz );
LINE: while( )
{
foreach $pattern ( @patterns )
{
if( /\b$pattern\b/i )
{
print;
next LINE;
}
}
}
The qr// operator showed up in perl 5.005. It compiles a regular
expression, but doesn't apply it. When you use the pre-compiled version
of the regex, perl does less work. In this example, I inserted a map()
to turn each pattern into its pre-compiled form. The rest of the script
is the same, but faster.
@patterns = map { qr/\b$_\b/i } qw( foo bar baz );
LINE: while( <> )
{
foreach $pattern ( @patterns )
{
print if /\b$pattern\b/i;
next LINE;
}
}
In some cases, you may be able to make several patterns into a single
regular expression. Beware of situations that require backtracking
though.
$regex = join '|', qw( foo bar baz );
LINE: while( <> )
{
print if /\b(?:$regex)\b/i;
}
For more details on regular expression efficiency, see Mastering Regular
Expressions by Jeffrey Freidl. He explains how regular expressions
engine work and why some patterns are surprisingly inefficient. Once you
understand how perl applies regular expressions, you can tune them for
individual situations.
------------------------------------------------------------ --------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
--
Posted via a free Usenet account from http://www.teranews.com
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 14.08.2007 21:31:09 von Jim Cochrane
On 2007-08-14, PerlFAQ Server wrote:
> This is an excerpt from the latest version perlfaq6.pod, which
> ...
>
> 6.16: How do I efficiently match many regular expressions at once?
>
> ...
>
> @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
>
> LINE: while( <> )
> {
> foreach $pattern ( @patterns )
> {
> print if /\b$pattern\b/i;
# Aren't the '\b's redundant here, or am I missing something? If so,
# does this slow processing down slightly?
> next LINE;
> }
> }
>
Re: FAQ 6.16 How do I efficiently match many regular expressionsat once?
am 14.08.2007 22:01:30 von Gunnar Hjalmarsson
Jim Cochrane wrote:
> On 2007-08-14, PerlFAQ Server wrote:
>> This is an excerpt from the latest version perlfaq6.pod, which
>> ...
>>
>> 6.16: How do I efficiently match many regular expressions at once?
>>
>> ...
>>
>> @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
>>
>> LINE: while( <> )
>> {
>> foreach $pattern ( @patterns )
>> {
>> print if /\b$pattern\b/i;
>
> # Aren't the '\b's redundant here, or am I missing something?
Unless you mean they should match words such as 'foolish' and
'barrister', you are missing something.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 14.08.2007 22:16:37 von Peter Makholm
Gunnar Hjalmarsson writes:
>>> @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
>>>
>>> LINE: while( <> )
>>> {
>>> foreach $pattern ( @patterns )
>>> {
>>> print if /\b$pattern\b/i;
>>
>> # Aren't the '\b's redundant here, or am I missing something?
>
> Unless you mean they should match words such as 'foolish' and
> 'barrister', you are missing something.
No, the \b's seem to get added twice. Once in the map statement and
once at the actual match statement. This should be redundant.
//Makholm
Re: FAQ 6.16 How do I efficiently match many regular expressionsat once?
am 14.08.2007 22:25:32 von Gunnar Hjalmarsson
Peter Makholm wrote:
> Gunnar Hjalmarsson writes:
>
>>>> @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
>>>>
>>>> LINE: while( <> )
>>>> {
>>>> foreach $pattern ( @patterns )
>>>> {
>>>> print if /\b$pattern\b/i;
>>> # Aren't the '\b's redundant here, or am I missing something?
>> Unless you mean they should match words such as 'foolish' and
>> 'barrister', you are missing something.
>
> No, the \b's seem to get added twice. Once in the map statement and
> once at the actual match statement. This should be redundant.
Ouch! For some reason, I didn't notice them being added in map().
Sorry for the confusion.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 16.08.2007 13:37:35 von Michele Dondi
On Tue, 14 Aug 2007 21:31:09 +0200 (CEST), Jim Cochrane
wrote:
>> @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
>>
>> LINE: while( <> )
>> {
>> foreach $pattern ( @patterns )
>> {
>> print if /\b$pattern\b/i;
>
># Aren't the '\b's redundant here, or am I missing something? If so,
># does this slow processing down slightly?
Yep, I guess that an original version of the entry didn't have the
map(), then it was added, and \b's were forgotten in the match.
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 17.08.2007 21:40:06 von Jim Cochrane
On 2007-08-16, Michele Dondi wrote:
> On Tue, 14 Aug 2007 21:31:09 +0200 (CEST), Jim Cochrane
> wrote:
>
>>> @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
>>>
>>> LINE: while( <> )
>>> {
>>> foreach $pattern ( @patterns )
>>> {
>>> print if /\b$pattern\b/i;
>>
>># Aren't the '\b's redundant here, or am I missing something? If so,
>># does this slow processing down slightly?
>
> Yep, I guess that an original version of the entry didn't have the
> map(), then it was added, and \b's were forgotten in the match.
>
>
> Michele
Yes, I thought it was something like that. Too bad there are no
lint-like tools to help keep documentation consistent. [English is a
language - shouldn't we be able to compile it? :=)]
--
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 18.08.2007 00:33:05 von Michele Dondi
On Fri, 17 Aug 2007 21:40:06 +0200 (CEST), Jim Cochrane
wrote:
>Yes, I thought it was something like that. Too bad there are no
>lint-like tools to help keep documentation consistent. [English is a
>language - shouldn't we be able to compile it? :=)]
I don't know, but I suppose that the following quotation, however
harsh, may shed some light and induce a meditation:
: The problem with defending the purity of the English language is that
: English is about as pure as a cribhouse whore. We don't just borrow
: words; on occasion, English has pursued other languages down alleyways
: to beat them unconscious and rifle their pockets for new vocabulary.
: - James Nicoll
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 18.08.2007 22:20:35 von brian d foy
In article , Jim
Cochrane wrote:
> On 2007-08-14, PerlFAQ Server wrote:
> > 6.16: How do I efficiently match many regular expressions at once?
> > @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
> >
> > LINE: while( <> )
> > {
> > foreach $pattern ( @patterns )
> > {
> > print if /\b$pattern\b/i;
>
> # Aren't the '\b's redundant here, or am I missing something? If so,
fixed, thanks.
--
Posted via a free Usenet account from http://www.teranews.com
Re: FAQ 6.16 How do I efficiently match many regular expressions at once?
am 25.08.2007 01:25:08 von Jim Cochrane
On 2007-08-17, Michele Dondi wrote:
> On Fri, 17 Aug 2007 21:40:06 +0200 (CEST), Jim Cochrane
> wrote:
>
>>Yes, I thought it was something like that. Too bad there are no
>>lint-like tools to help keep documentation consistent. [English is a
>>language - shouldn't we be able to compile it? :=)]
>
> I don't know, but I suppose that the following quotation, however
> harsh, may shed some light and induce a meditation:
>
>: The problem with defending the purity of the English language is that
>: English is about as pure as a cribhouse whore. We don't just borrow
>: words; on occasion, English has pursued other languages down alleyways
>: to beat them unconscious and rifle their pockets for new vocabulary.
>: - James Nicoll
>
>
> Michele
(Sorry, late response; I've been busy.)
That's a great quote. I hadn't heard of James Nicoll, but he sounds
like quite an interesting and creative character -
http://en.wikipedia.org/wiki/James_D._Nicoll
(Sorry for the OTP)
--