FAQ 6.11 Can I use Perl regular expressions to match balanced text?
am 15.10.2007 21:03:02 von PerlFAQ ServerThis is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
------------------------------------------------------------ --------
6.11: Can I use Perl regular expressions to match balanced text?
Historically, Perl regular expressions were not capable of matching
balanced text. As of more recent versions of perl including 5.6.1
experimental features have been added that make it possible to do this.
Look at the documentation for the (??{ }) construct in recent perlre
manual pages to see an example of matching balanced parentheses. Be sure
to take special notice of the warnings present in the manual before
making use of this feature.
CPAN contains many modules that can be useful for matching text
depending on the context. Damian Conway provides some useful patterns in
Regexp::Common. The module Text::Balanced provides a general solution to
this problem.
One of the common applications of balanced text matching is working with
XML and HTML. There are many modules available that support these needs.
Two examples are HTML::Parser and XML::Parser. There are many others.
An elaborate subroutine (for 7-bit ASCII only) to pull out balanced and
possibly nested single chars, like "`" and "'", "{" and "}", or "(" and
")" can be found in
http://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz .
The C::Scan module from CPAN also contains such subs for internal use,
but they are undocumented.
------------------------------------------------------------ --------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.