Working with a 1-GB XML file...
am 17.01.2008 18:40:30 von kj
Hi. I have a large XML file (aboug 1G) that I would like to be
able to interrogate in my code. Given its size, it's out of the
question to read it all into memory. I'd like to avoid having to
convert this thing to an RDB.
Does anyone know of a module that can treat such a file as
disk-resident data?
TIA!
kj
--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.
Re: Working with a 1-GB XML file...
am 17.01.2008 19:07:44 von Keith Keller
On 2008-01-17, kj wrote:
>
> Hi. I have a large XML file (aboug 1G) that I would like to be
> able to interrogate in my code. Given its size, it's out of the
> question to read it all into memory. I'd like to avoid having to
> convert this thing to an RDB.
>
> Does anyone know of a module that can treat such a file as
> disk-resident data?
You should probably read
http://perl-xml.sourceforge.net/faq/#parser_selection
It sounds like you might want a SAX-based parser.
--keith
--
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information
Re: Working with a 1-GB XML file...
am 17.01.2008 19:10:41 von xhoster
kj wrote:
> Hi. I have a large XML file (aboug 1G) that I would like to be
> able to interrogate in my code.
In what ways do you want to interrogate it? Is all the data in the file
relevant to you, or could you abstract just the relevant parts of it into
a much smaller, memory resident set? (XML::Twig might be good for that.)
> Given its size, it's out of the
> question to read it all into memory. I'd like to avoid having to
> convert this thing to an RDB.
How about converting it to a DBM::Deep file?
> Does anyone know of a module that can treat such a file as
> disk-resident data?
Well, no module is needed to treat it as disk-resident data, as that is
exactly what it is already. You need to give us a functional definition of
how you want to access the data. That will most likely drive the storage,
not the other way around.
You might be able to use DBD::AnyData, but there is no particular reason to
think it will like the format your XML is already in, or that it will be
fast.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
Re: Working with a 1-GB XML file...
am 17.01.2008 20:28:47 von John Bokma
kj wrote:
>
>
>
> Hi. I have a large XML file (aboug 1G) that I would like to be
> able to interrogate in my code. Given its size, it's out of the
> question to read it all into memory. I'd like to avoid having to
> convert this thing to an RDB.
>
> Does anyone know of a module that can treat such a file as
> disk-resident data?
It all depends a lot on /what/ is in the XML file. If it are records you
have to process one by one, XML::Twig might be the right answer. If you
have to process the file in a stream based way SAX or similar module might
be the answer.
--
John
http://johnbokma.com/