FAQ 6.16 How do I efficiently match many regular expressions at once?

FAQ 6.16 How do I efficiently match many regular expressions at once?

am 17.10.2007 09:03:02 von PerlFAQ Server

This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

------------------------------------------------------------ --------

6.16: How do I efficiently match many regular expressions at once?



( contributed by brian d foy )

Avoid asking Perl to compile a regular expression every time you want to
match it. In this example, perl must recompile the regular expression
for every iteration of the foreach() loop since it has no way to know
what $pattern will be.

@patterns = qw( foo bar baz );

LINE: while( )
{
foreach $pattern ( @patterns )
{
if( /\b$pattern\b/i )
{
print;
next LINE;
}
}
}

The qr// operator showed up in perl 5.005. It compiles a regular
expression, but doesn't apply it. When you use the pre-compiled version
of the regex, perl does less work. In this example, I inserted a map()
to turn each pattern into its pre-compiled form. The rest of the script
is the same, but faster.

@patterns = map { qr/\b$_\b/i } qw( foo bar baz );

LINE: while( <> )
{
foreach $pattern ( @patterns )
{
print if /$pattern/i;
next LINE;
}
}

In some cases, you may be able to make several patterns into a single
regular expression. Beware of situations that require backtracking
though.

$regex = join '|', qw( foo bar baz );

LINE: while( <> )
{
print if /\b(?:$regex)\b/i;
}

For more details on regular expression efficiency, see Mastering Regular
Expressions by Jeffrey Freidl. He explains how regular expressions
engine work and why some patterns are surprisingly inefficient. Once you
understand how perl applies regular expressions, you can tune them for
individual situations.



------------------------------------------------------------ --------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.