Working with XML in Perl

Working with XML in Perl

am 11.11.2008 00:21:09 von Curtis Leach

I'm using a Perl 5.8.8 build on a Windows Server platform and I need to
work with XML files.

Is there a preferred module for use when working with XML in Perl? I'm
new to XML & I'm just looking for directions to XML modules to look at
so I don't waste too much time going down the wrong path while wading
through all the available XML modules on CPAN.

I have a few example programs using XML::Simple (v2.14)for parsing out a
couple of config values from some small XML files and returning them to
the caller, but I'm going to need to programmatically edit more complex
XML files that are considerably larger & save the results. With the
potential of the files being edited being huge. But maybe 10,000 to
15,000 data points to update being average.

I'm going to see multiple entries such as:


.....


So I'll load the XML file into memory, validate it's well formed XML,
and then locate every occurrence of tag22 to update it's optional "seg"
attribute. Making sure I only update a specific tag22 one time! Once
I've updated them all, I'll write the updated XML document back to disk.

So does this sound like something XML::Simple can handle? Or should I
be looking at another XML module? Are there tips to follow for making
sure the generated file isn't to hard to look at by hand? (The by hand
part isn't a requirement, but past experience suggests taking an eyeball
to something with your favorite editor can help a lot during
troubleshooting.)

Curtis


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Working with XML in Perl

am 11.11.2008 00:51:39 von Wayne Simmons

Curtis Leach said:
> Is there a preferred module for use when working with XML in Perl?

I use:
XML::DOM;

If you have the full file that's probably the best one I've found. Although
you might consume massive amounts of memory depending on your file size (you
seemed to indicate they'd be big).

There are basically 2 ways to parse XML one is with a Tree structure with
something like XML::DOM, or with an event based parser (Usually SAX is
mentioned here). Not sure much about the event based one, I've always used
DOM. There are many proponent of both types of processing but if you've got
the whole file it's pretty easy to implement what you want to do with DOM.

Curtis:
> So I'll load the XML file into memory, validate it's well formed XML, and
>then locate every occurrence of tag22 to update it's optional "seg"
>attribute. Making sure I only update a specific tag22 one time! Once I've
>updated them all, I'll write the updated XML document back to disk.


#load and parse file into a variable
$parser = new XML::DOM::Parser;
$maindoc = $parser->parsefile($file);
#get a perl list of all tags named tag22
foreach $node ($mainDoc->getElementsByTagName('tag22',1))
{ $node->setAttribute('seq','newval'); }

#save it back.
..... not sure how to do this... honestly haven't done it before.


GL, and hope that helps.

-Wayne Simmons


--
Software Engineer
InterSystems USA, Inc.
303-858-1000




_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Working with XML in Perl

am 11.11.2008 01:18:15 von Michael Ellery

Wayne Simmons wrote:
> Curtis Leach said:
>> Is there a preferred module for use when working with XML in Perl?
>
> I use:
> XML::DOM;
>
>
>
> #save it back.
> .... not sure how to do this... honestly haven't done it before.
>

that's the easy part:

$maindoc->printToFile($file);

-Mike



_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Working with XML in Perl

am 11.11.2008 02:38:40 von Chris Prather

On Mon, Nov 10, 2008 at 6:21 PM, Curtis Leach wrote:
> I'm using a Perl 5.8.8 build on a Windows Server platform and I need to
> work with XML files.
>
> Is there a preferred module for use when working with XML in Perl? I'm
> new to XML & I'm just looking for directions to XML modules to look at
> so I don't waste too much time going down the wrong path while wading
> through all the available XML modules on CPAN.
>
> I have a few example programs using XML::Simple (v2.14)for parsing out a
> couple of config values from some small XML files and returning them to
> the caller, but I'm going to need to programmatically edit more complex
> XML files that are considerably larger & save the results. With the
> potential of the files being edited being huge. But maybe 10,000 to
> 15,000 data points to update being average.
>
> I'm going to see multiple entries such as:
>
>
> ....
>
>
> So I'll load the XML file into memory, validate it's well formed XML,
> and then locate every occurrence of tag22 to update it's optional "seg"
> attribute. Making sure I only update a specific tag22 one time! Once
> I've updated them all, I'll write the updated XML document back to disk.
>
> So does this sound like something XML::Simple can handle? Or should I
> be looking at another XML module? Are there tips to follow for making
> sure the generated file isn't to hard to look at by hand? (The by hand
> part isn't a requirement, but past experience suggests taking an eyeball
> to something with your favorite editor can help a lot during
> troubleshooting.)

If the output XML is important XML::Smart really isn't the best
choice. My experience has been that it is a pain to define your XML
output structure with XML::Simple. It was designed to be a very simple
XML -> Perl mapper anything more advanced than that and (in my
opinion) you really should use a bigger API. Like Wayne says elsewhere
there are two types, DOM parsers that build a version of your document
in memory as an Object tree, and SAX parsers which treat your XML as a
"stream" that trigger call back events. The choice between them is
based on the size of your documents (will they fit within main memory?
I've heard of people having to parse >10G documents where you really
*can't* use a DOM parser) and your preferred method of work.

For DOM parsing I haven't really used much beyond XML::LibXML which is
a wrapper around the C libxml module. There are packages that provide
better sugar than the C-API but if you know one DOM you know them all.
For SAX Parsing I've used the XML::SAX module which is well
implemented and "stable" (in that I know the developers haven't found
bugs or required major changes to it in the last 6 years).

You should really look into the Perl + XML columns on XML.com (
http://www.xml.com/pub/at/15 ) which are old now but are still useful
and from what I can tell having just built an XML Toolkit for work
based on some of their ideas and code. Beyond that you might find the
Perl-XML FAQ ( http://perl-xml.sourceforge.net/faq/ ) useful too.

-Chris
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Working with XML in Perl

am 11.11.2008 07:42:53 von Foo JH

Curtis Leach wrote:
> I'm using a Perl 5.8.8 build on a Windows Server platform and I need to
> work with XML files.
If you're doing only reading (not write), XML::Simple makes XML a breeze.

XML::Simple converts your xml file into a tree of native hashes and
arrays. No methods to learn.
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Working with XML in Perl

am 11.11.2008 14:07:52 von Jenda Krynicky

From: "Curtis Leach"
> I'm using a Perl 5.8.8 build on a Windows Server platform and I need to
> work with XML files.
>
> Is there a preferred module for use when working with XML in Perl? I'm
> new to XML & I'm just looking for directions to XML modules to look at
> so I don't waste too much time going down the wrong path while wading
> through all the available XML modules on CPAN.

Have a look at XML::Twig. Especially for huge documents it's much
better than the XML::DOM or XML::LibXML (in tree mode) since it
doesn't require that you load the whole file into memory at one time.

> I have a few example programs using XML::Simple (v2.14)for parsing out a
> couple of config values from some small XML files and returning them to
> the caller, but I'm going to need to programmatically edit more complex
> XML files that are considerably larger & save the results. With the
> potential of the files being edited being huge. But maybe 10,000 to
> 15,000 data points to update being average.
>
> I'm going to see multiple entries such as:
>
>
> ....
>
>
> So I'll load the XML file into memory, validate it's well formed XML,
> and then locate every occurrence of tag22 to update it's optional "seg"
> attribute. Making sure I only update a specific tag22 one time! Once
> I've updated them all, I'll write the updated XML document back to disk.

Why do you want to load it all into memory and validate before you
start to update it?

You can process the file as you parse the tags and create the updated
file as you go. And if you find out it was not well-formed you can
delete the new file and report the error. That will be much much
quicker and use much less memory.

Have a look at XML::Twig and XML::Rules for things like that.

Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Working with XML in Perl

am 12.11.2008 00:13:27 von Curtis Leach

Thank you to everyone who responded. This should help me focus my
research to just a few XML modules in getting to a good solution for my
needs.

Thanks Chris for the links so that I can review some basic XML concepts
& issues.

Curtis

-----Original Message-----
From: activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of Jenda
Krynicky
Sent: Tuesday, November 11, 2008 7:08 AM
To: activeperl@listserv.ActiveState.com
Subject: Re: Working with XML in Perl

From: "Curtis Leach"
> I'm using a Perl 5.8.8 build on a Windows Server platform and I need
> to work with XML files.
>
> Is there a preferred module for use when working with XML in Perl?
> I'm new to XML & I'm just looking for directions to XML modules to
> look at so I don't waste too much time going down the wrong path while

> wading through all the available XML modules on CPAN.

Have a look at XML::Twig. Especially for huge documents it's much better
than the XML::DOM or XML::LibXML (in tree mode) since it doesn't require
that you load the whole file into memory at one time.

> I have a few example programs using XML::Simple (v2.14)for parsing out

> a couple of config values from some small XML files and returning them

> to the caller, but I'm going to need to programmatically edit more
> complex XML files that are considerably larger & save the results.
> With the potential of the files being edited being huge. But maybe
> 10,000 to 15,000 data points to update being average.
>
> I'm going to see multiple entries such as:
>
> ....
>
>
> So I'll load the XML file into memory, validate it's well formed XML,
> and then locate every occurrence of tag22 to update it's optional
"seg"
> attribute. Making sure I only update a specific tag22 one time! Once

> I've updated them all, I'll write the updated XML document back to
disk.

Why do you want to load it all into memory and validate before you start
to update it?

You can process the file as you parse the tags and create the updated
file as you go. And if you find out it was not well-formed you can
delete the new file and report the error. That will be much much quicker
and use much less memory.

Have a look at XML::Twig and XML::Rules for things like that.

Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz ===== When it comes
to wine, women and song, wizards are allowed to get drunk and croon as
much as they like.
-- Terry Pratchett in Sourcery

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs