Re: match nested tags

Re: match nested tags

am 03.05.2006 23:20:19 von Jake Peavy

Hey guys, I'm not getting any responses over at perl.beginners so I
thought I'd cross post this here to see if anyone has any ideas.

Here's the original message:

FangQ wrote:
> hi
>
> is there a simple way using regular expression to find nested tags?
>
> for example, the string is:
>
> {{ {A} this is part A of the document
> {{ {A.1} this is part A1 }}
> }}
>
> I want to define a function findtag("A") to give me
>
> this is part A of the document
> {{ {A.1} this is part A1 }}
>
>
> and findtag("A.1") to give me
>
> this is part A1
>
> can anyone give some hint?
> thanks

I thought this sounded like a prime candidate for Parse::RecDescent,
but I can't get the nested nature of the part(s) to work.

Here's my first crack at it, but it doesn't parse. I monkeyed with it
for a while, but to no avail.

I did note, however, that in the Parse::RecDescent FAQ, Pastor Conway
suggests using Text::Balanced to extract nested parenthesis. I tried
that too, but again, no luck.

I'd be interested to see if anyone here has a suggestion for this
problem. Thanks in advance.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use Parse::RecDescent;

my $grammar = <<'EO_GRAMMAR';


document : '{{' part(s) '}}'

part : part_id part_text part(s?)
part_id : '{' /[^}]+/ '}'
part_text : /.+/s

EO_GRAMMAR

my $parser = Parse::RecDescent->new($grammar)
or die "Could not parse grammar: $@";

my $document = do {local $/; };

my $doc_ref = $parser->document($document)
or die "Invalid document";

print Dumper $doc_ref;


__DATA__
{{ {A} this is part A of the document
{{ {A.1} this is part A1 }}
}}

__END__

-jp

Re: match nested tags

am 04.05.2006 00:04:57 von Jake Peavy

DJ Stunks wrote:
> FangQ wrote:
> > is there a simple way using regular expression to find nested tags?
> >
> > for example, the string is:
> >
> > {{ {A} this is part A of the document
> > {{ {A.1} this is part A1 }}
> > }}

Allow me to just reply to myself here... :P

I repaired my crummy grammar and posting technique (who would have
thought __END__ would end up in __DATA__?). My grammar now parses and
is shown below (getting there!), now I need to concentrate on getting
the output hash right.

Also, I'm not able to have a { or } in the part_text, which I expect
would be a problem in the real world.... I don't know how to
incorporate Text::Balanced here though....

I'll keep working on it.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use Parse::RecDescent;

my $grammar = <<'EO_GRAMMAR';


document : part(s)

part : '{{' part_id part_text part(s?) '}}'
part_id : '{' /[^}]+/ '}'
part_text : /[^{}]+/

EO_GRAMMAR

my $parser = Parse::RecDescent->new($grammar)
or die "Could not parse grammar: $@";

my $document = do {local $/; };

my $doc_ref = $parser->document($document)
or die "Invalid document";

print Dumper $doc_ref;

__DATA__
{{ {A} this is part A of the document
{{ {A.1} this is part A1 }}
}}

-jp