Regular Expression for XML Parsing

Regular Expression for XML Parsing

am 27.12.2007 21:59:12 von tushar.saxena

Hi,

I have a set of XML files from which I need to extract some data. The
format of the file is as follows :


DATA1



DATA2


I need to extract the DATA part of the xml structure

Note : tag3 can be contained either within tag1 or tag2, but I need to
extract data only from tag1. i.e. DATA1 should be extracted, but not
DATA2

If I want to get both DATA1 and DATA2 I can use a simple regex like :

if (($_ =~ /(\w+)<\/tag3>/g))
{
print $1
}

But if I try to get only DATA1 (embedded within tag1) I try using
something like this, but am unable to get it to work

if (($_ =~ /[\n\s\S\w\W]*(\w+)<\/tag2>[\n\s\S\w\W]*<\/
tag1>/g))
{
print $1
}

In this second case, the match itself fails.

Any help would be appreciated !

Re: Regular Expression for XML Parsing

am 27.12.2007 23:49:34 von jurgenex

On tushar.saxena@gmail.com wrote:
>I have a set of XML files
>I need to extract the DATA part of the xml structure
>If I want to get both DATA1 and DATA2 I can use a simple regex like :

It's a bad idea in the first place. XML is not a regular language, why would
you use regular expressions to parse it?

>Any help would be appreciated !

Use a tool that is designed to parse XML like e.g. any of the XML parser
modules on CPAN.

jue

Re: Regular Expression for XML Parsing

am 28.12.2007 00:06:47 von patriknym

On 27 Dec, 20:59, tushar.sax...@gmail.com wrote:
> Hi,
>
> I have a set of XML files from which I need to extract some data. The
> format of the file is as follows :
>
>
> DATA1
>

>
>
> DATA2
>

>
> I need to extract the DATA part of the xml structure
>
> Note : tag3 can be contained either within tag1 or tag2, but I need to
> extract data only from tag1. i.e. DATA1 should be extracted, but not
> DATA2
>
> If I want to get both DATA1 and DATA2 I can use a simple regex like :
>
> if (($_ =~ /(\w+)<\/tag3>/g))
> {
> print $1
>
> }
>
> But if I try to get only DATA1 (embedded within tag1) I try using
> something like this, but am unable to get it to work
>
> if (($_ =~ /[\n\s\S\w\W]*(\w+)<\/tag2>[\n\s\S\w\W]*<\/
> tag1>/g))
> {
> print $1
>
> }
>
> In this second case, the match itself fails.
>
> Any help would be appreciated !

$/ = "";

while (<>) {
if ( m{.*?(\w+).*?}gs )
{
print "$1\n";
}
}

Re: Regular Expression for XML Parsing

am 28.12.2007 01:19:44 von Tad J McClellan

tushar.saxena@gmail.com wrote:

> I have a set of XML files from which I need to extract some data. The
> format of the file is as follows :
>
>
> DATA1
>

>
>
> DATA2
>



I thought you said you had an XML file.

That is not a valid XML file...


> I need to extract the DATA part of the xml structure
>
> Note : tag3 can be contained either within tag1 or tag2, but I need to
> extract data only from tag1. i.e. DATA1 should be extracted, but not
> DATA2
>
> If I want to get both DATA1 and DATA2 I can use a simple regex like :


Using a regular expression to "parse" a non-regular language is
fraught with peril, and nearly always a Bad Idea.

Use a module that understands XML for processing XML data.


> Any help would be appreciated !


Assuming that you have actual valid XML in $xml, then:

use XML::Simple;

my $ref = XMLin($xml);
foreach my $child ( @{ $ref->{tag1} } ) {
print "$child->{tag3}\n";
}


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Regular Expression for XML Parsing

am 28.12.2007 13:35:10 von Michele Dondi

On Thu, 27 Dec 2007 12:59:12 -0800 (PST), tushar.saxena@gmail.com
wrote:

>Subject: Regular Expression for XML Parsing

Nope. Perhaps a Regex for XML Parsing, in the Perl 6 acceptation of a
"Regex" which is not assumed to be a "Regular Expression" any more.
You will have to wait for quite a while, though...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,