Problem Parsing Huge XML file using XML::Twig

Problem Parsing Huge XML file using XML::Twig

am 24.04.2007 04:24:22 von vikrant

Hi,
I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
file is as following:-
This is a sample code:-
------------------------------------------------------------ ------------------------------------------------------------ -


AEC


21CR10.2
HUGE
AEC
10.99

http://www.example.com
http://www.example2.com



21CR11.2
ARROW
AEC
10.49

http://www.example.com
http://www.example2.com




------------------------------------------------------------ ------------------------------------------------------------ ------------
Here,Product Tag repeating 2000 times in original file.

I am able to get the values of ProductID,SupplierID and
PurchasePrice using the following code.But,How do a get the value's at
"link" Node's ,attributes values and node value of ProductInfo NODE.
I know we can use XPath with XML::Twig but unfortunaly i am not able
to use it.So,please help me.Any document,link or refrences related to
it.I search a lot but failed to find.
------------------------------------------------------------ ------------------------------------------------------------ -----
#!/bin/perl -w
use strict;
use XML::Twig;

my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
$t->parsefile( 'sample.xml');
exit;
sub product
{ my ($t, $product)= @_;
my %product;
$product{id}= $product->field( 'ProductID');
$product{SupplierID}= $product->field( 'SupplierID');
$product{PurchasePrice}= $product->field( 'PurchasePrice');

print "$product{id}: $product{SupplierID} :$product{PurchasePrice}
\n";
$product->delete;
}
------------------------------------------------------------ ------------------------------------------------------------ ------
One strange thing i find accidently is that when i am removing the
"StoreInfo" tag from above XML code the following error coming on
screen.
Error:-
junk after document element at line 5, column 0, byte 53 at /usr/lib/
perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187

Any comments.

Sorry for band english :)

Regards,
Vikrant

Re: Problem Parsing Huge XML file using XML::Twig

am 24.04.2007 06:54:20 von mirod

vikrant wrote:
> Hi,
> I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
> file is as following:-
> This is a sample code:-
> ------------------------------------------------------------ ------------------------------------------------------------ -
>
>
> AEC
>
>
> 21CR10.2
> HUGE
> AEC
> 10.99
>
> http://www.example.com
> http://www.example2.com
>

>

>
> 21CR11.2
> ARROW
> AEC
> 10.49
>
> http://www.example.com
> http://www.example2.com
>

>

>

>

> ------------------------------------------------------------ ------------------------------------------------------------ ------------
> Here,Product Tag repeating 2000 times in original file.
>
> I am able to get the values of ProductID,SupplierID and
> PurchasePrice using the following code.But,How do a get the value's at
> "link" Node's ,attributes values and node value of ProductInfo NODE.
> I know we can use XPath with XML::Twig but unfortunaly i am not able
> to use it.So,please help me.Any document,link or refrences related to
> it.I search a lot but failed to find.
> ------------------------------------------------------------ ------------------------------------------------------------ -----
> #!/bin/perl -w
> use strict;
> use XML::Twig;
>
> my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
> $t->parsefile( 'sample.xml');
> exit;
> sub product
> { my ($t, $product)= @_;
> my %product;
> $product{id}= $product->field( 'ProductID');
> $product{SupplierID}= $product->field( 'SupplierID');
> $product{PurchasePrice}= $product->field( 'PurchasePrice');
>
> print "$product{id}: $product{SupplierID} :$product{PurchasePrice}
> \n";
> $product->delete;
> }
> ------------------------------------------------------------ ------------------------------------------------------------ ------

'field' is not the only method to get data from the data.
In your case you would use:

my $name= $product->first_child( 'ProductInfo')->att( 'name');

my $links= $product->first_child( 'links'); # the element links
my @links= map { $_->text } $links->children( 'link');

The tutorial at http://www.xmltwig.com/xmltwig/tutorial/index.html
(referenced in the README and at the top of the doc of the module)
gives more info about those methods.

> One strange thing i find accidently is that when i am removing the
> "StoreInfo" tag from above XML code the following error coming on
> screen.
> Error:-
> junk after document element at line 5, column 0, byte 53 at /usr/lib/
> perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187

If you remove the StoreInfo tag then the parser sees
AEC as the entire document, then dies, with an
appropriate error message, when it finds the rest of your original
document, and has no way of dealing with it, as it has already seen a
complete tree.

--
mirod

Re: Problem Parsing Huge XML file using XML::Twig

am 24.04.2007 18:23:46 von vikrant

> 'field' is not the only method to get data from the data.
> In your case you would use:
>
> my $name= $product->first_child( 'ProductInfo')->att( 'name');
>
> my $links= $product->first_child( 'links'); # the element links
> my @links= map { $_->text } $links->children( 'link');
>
> The tutorial athttp://www.xmltwig.com/xmltwig/tutorial/index.html
> (referenced in the README and at the top of the doc of the module)
> gives more info about those methods.
>
> > One strange thing i find accidently is that when i am removing the
> > "StoreInfo" tag from above XML code the following error coming on
> > screen.
> > Error:-
> > junk after document element at line 5, column 0, byte 53 at /usr/lib/
> > perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187
>
> If you remove the StoreInfo tag then the parser sees
> AEC as the entire document, then dies, with an
> appropriate error message, when it finds the rest of your original
> document, and has no way of dealing with it, as it has already seen a
> complete tree.
>
> --
> mirod

Thanks for the answer.

Regards,
Vikrant