Help with validating XML (DTD or Schema) with PERL

Help with validating XML (DTD or Schema) with PERL

am 26.06.2007 00:06:39 von 1234marlon

Can someone tell me if a reliable production PERL package was ever
created to handle either W3C DTD or Schema standards? I realize
libxml2 based packages are very good, but I am not willing to tackle
the maintenance required for compiled libxml2 program in our
production environment.

It is the year 2007 for heavens sake; did anyone ever create something
other than poorly tested, partial W3C standards, barely usable module
to handle DTD or Schemas? I have been search CPAN and tested modules
franticly for months without finding anything that is good (other than
lidxml2 stuff looks good).

I am using XML::SAX::ParserFactory (PurePerl) as my parser and
original thought I would easily find a FILTER to handle DTD or Schema
validation; instead after months of reseach, I find myself pleading
for help in the newsgroup. Perl is really treating me bad for this
XML project; I did a similar project with VB6 and it was a breeze.

Anyway, right now I am trying to test XML-DTD-0.06. However the
documentation is gibberish, it keeps referring to $rt without
explaining what $rt is...omg!!! If someone has a working example of
using XML-DTD-0.06, I would love to see it.

Thank you in advance for any guidance you can give me!!!!!!

Re: Help with validating XML (DTD or Schema) with PERL

am 26.06.2007 01:02:05 von Gunnar Hjalmarsson

1234marlon@gmail.com wrote:
> did anyone ever create something
> other than poorly tested, partial W3C standards, barely usable module
> to handle DTD or Schemas? I have been search CPAN and tested modules
> franticly for months without finding anything that is good (other than
> lidxml2 stuff looks good).

Well, if that's true, please feel free to contribute with bug fixes and
improvements.

> Anyway, right now I am trying to test XML-DTD-0.06. However the
> documentation is gibberish, it keeps referring to $rt without
> explaining what $rt is...omg!!!

I don't understand what you are talking about.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Help with validating XML (DTD or Schema) with PERL

am 26.06.2007 02:14:36 von 1234marlon

On Jun 25, 6:02 pm, Gunnar Hjalmarsson wrote:
> 1234mar...@gmail.com wrote:

> Well, if that's true, please feel free to contribute with bug fixes and
> improvements.

I do not have a problem contributing. However, I would rather not
spend countless hours reinventing/fixing the wheel, which is an
unplanned task for this project. I just find it hard to believe there
are not solid modules for validating XML other than LIBXML. I am
seriously hoping someone says something like "try
XML::FOO::FILTER:DTD; it implements nearly all W3C standards and works
great".

> > Anyway, right now I am trying to test XML-DTD-0.06. However the
> > documentation is gibberish, it keeps referring to $rt without
> > explaining what $rt is...omg!!!
>
> I don't understand what you are talking about.

I am talking about XML::DTD::Parser and the POD located
http://search.cpan.org/~wohl/XML-DTD-0.06/lib/XML/DTD/Parser .pm. The
information does not make sense and I was wondering if anyone has a
working example. Currently, I do not know where the parsed output
resides, which I need for passing to SAX processing. Sooner or later I
will figure out what the author is trying to say; I am just hoping it
can be sooner and that it is good stuff, because I am way behind
schedule.

Thank you for your assistance.

Re: Help with validating XML (DTD or Schema) with PERL

am 26.06.2007 10:01:20 von mirod

1234marlon@gmail.com wrote:
> Can someone tell me if a reliable production PERL package was ever
> created to handle either W3C DTD or Schema standards? I realize
> libxml2 based packages are very good, but I am not willing to tackle
> the maintenance required for compiled libxml2 program in our
> production environment.

It looks like yur choice is between maintening the dependencies needed
by working Perl modules that use external library, or writing the module
yourself.

If you don't want to maintain a libxml2 environment (why? I don't think
it's that hard to compile), you can have a look at XML::Xerces
(http://search.cpan.org/dist/XML-Xerces/ ) which depends on Xerces (I
have never used it though). I don't think there is any other sane choice.

It appears that no one in the Perl community has had the will, time and
energy to write a complete validating XML parser, prefering instead to
rely on external resources. Whether that's good or bad is debatable. My
take would be that PR-wise it's pretty bad, but practically it makes sense.

--
mirod

Re: Help with validating XML (DTD or Schema) with PERL

am 27.06.2007 02:09:05 von 1234marlon

On Jun 26, 3:01 am, mirod wrote:
> 1234mar...@gmail.com wrote:

> It looks like yur choice is between maintening the dependencies needed
> by working Perl modules that use external library, or writing the module
> yourself.

Maybe. However, I will probably just code the validation/defaults
right into the program (the old fashion way).
For this project, I decided to use XML as the format for user
configuration data because XML is awesome.

> If you don't want to maintain a libxml2 environment (why? I don't think
> it's that hard to compile), you can have a look at XML::Xerces
> (http://search.cpan.org/dist/XML-Xerces/) which depends on Xerces (I
> have never used it though). I don't think there is any other sane choice.

It is hard to maintain in a production Unix environment, where there
is not a licenced compiler, different flavors of both Unix and Perl, I
do not have root access, politics, and the list does go on.... It is
easier to just say "a production environment". However, I still may
battle to use a libxml2 based package someday, but it is not worth my
effort for this project.

> It appears that no one in the Perl community has had the will, time and
> energy to write a complete validating XML parser, prefering instead to
> rely on external resources. Whether that's good or bad is debatable. My
> take would be that PR-wise it's pretty bad, but practically it makes sense.
>
> --
> mirod

I am still testing and hoping there is something good out there (other
than the mighty libxml2 stuff). Are you trying to scare me?...haha

-Long Live Perl!!!!-

Re: Help with validating XML (DTD or Schema) with PERL

am 28.06.2007 15:53:10 von keith

On Jun 25, 5:06 pm, 1234mar...@gmail.com wrote:
> Can someone tell me if a reliable production PERL package was ever
> created to handle either W3C DTD or Schema standards? I realize
> libxml2 based packages are very good, but I am not willing to tackle
> the maintenance required for compiled libxml2 program in our
> production environment.
>
> It is the year 2007 for heavens sake; did anyone ever create something
> other than poorly tested, partial W3C standards, barely usable module
> to handle DTD or Schemas? I have been search CPAN and tested modules
> franticly for months without finding anything that is good (other than
> lidxml2 stuff looks good).
I've said as much before on this group but, .....

Amongst the many things I've happily & successfully done with Perl &
CPAN modules in the last 15 years
or so, parsing XML is not one. Pickup "Java & XML" by O'Reilly, and
with little background in the
language you'll probably be able to solve your problem in hours.

Perl SHOULD be good at this sort of thing (and as others have
suggested, the Perl community would love some
help on this, I'm sure) but issues w/ recursion and lack of an
adequate threading model seem to put it
at a disadvantage vis a vis Java, in my opinion, for tasks like this,
while one of Perl's great
strengths over Java (built-in regular expression matching) are
somewhat wasted on data as structured as XML.

If you don't want to re-invent Perl wheels here, and can't or don't
want to use Java, why not go back
to VB (if you're running on Windows only)?

I've had to write my own lexers & parsers for even the lightest of XML
tasks in Perl. As good as it
is for other things, XML seems to be an Achilles heel.

Keith
------
"If all you have is a hammer, everything's a nail"




>
> I am using XML::SAX::ParserFactory (PurePerl) as my parser and
> original thought I would easily find a FILTER to handle DTD or Schema
> validation; instead after months of reseach, I find myself pleading
> for help in the newsgroup. Perl is really treating me bad for this
> XML project; I did a similar project with VB6 and it was a breeze.
>
> Anyway, right now I am trying to test XML-DTD-0.06. However the
> documentation is gibberish, it keeps referring to $rt without
> explaining what $rt is...omg!!! If someone has a working example of
> using XML-DTD-0.06, I would love to see it.
>
> Thank you in advance for any guidance you can give me!!!!!!

Re: Help with validating XML (DTD or Schema) with PERL

am 29.06.2007 01:52:00 von 1234marlon

On Jun 28, 8:53 am, Keith wrote:

> I've said as much before on this group but, .....
>
> Amongst the many things I've happily & successfully done with Perl &
> CPAN modules in the last 15 years
> or so, parsing XML is not one. Pickup "Java & XML" by O'Reilly, and
> with little background in the
> language you'll probably be able to solve your problem in hours.
>

I actually tried using Java for XML processing last year and found the
same situation that I am having with Perl looking for XML/XSLT/Schema/
DTD/XPath support (for a production environment). I thought about
buying a Java-XML book, but did not know which one to get or if it
would help - Java support for stable XML processing looked horrible to
me. However, thanks for the "Java & XML" by O'Reilly tip, I will
definitely look at it!!!!

> Perl SHOULD be good at this sort of thing (and as others have
> suggested, the Perl community would love some
> help on this, I'm sure) but issues w/ recursion and lack of an
> adequate threading model seem to put it
> at a disadvantage vis a vis Java, in my opinion, for tasks like this,
> while one of Perl's great
> strengths over Java (built-in regular expression matching) are
> somewhat wasted on data as structured as XML.
>
yes, yes, yes... libxml2 based modules is the Perl solution that
everyone is using. I am not sure about this, but it appears that all
other modules for core XML processing on CPAN were practically
abandoned, maybe due to the superior libxml2 based modules.

> If you don't want to re-invent Perl wheels here, and can't or don't
> want to use Java, why not go back
> to VB (if you're running on Windows only)?
>
I wanted to use Java for this project, but was afraid about XML module
support and development time. I am unable to write Java code as fast
as I can write Perl code, which is funny because this validation thing
is taking me forever to resolve. I am doing this for Unix, otherwise,
yes, VB (6/.net).

> I've had to write my own lexers & parsers for even the lightest of XML
> tasks in Perl. As good as it
> is for other things, XML seems to be an Achilles heel.
>
> Keith

I need this done by next week, so I will probably write my own
(garbage) code to handle XML validation. As for now, I am still
looking around CPAN and dissecting weird modules like
XML::DTD::Parser.

Thanks for your feedback!

Re: Help with validating XML (DTD or Schema) with PERL

am 01.07.2007 12:36:20 von hjp-usenet2

On 2007-06-28 13:53, Keith wrote:
> On Jun 25, 5:06 pm, 1234mar...@gmail.com wrote:
>> Can someone tell me if a reliable production PERL package was ever
>> created to handle either W3C DTD or Schema standards? I realize
>> libxml2 based packages are very good, but I am not willing to tackle
>> the maintenance required for compiled libxml2 program in our
>> production environment.
>>
>> It is the year 2007 for heavens sake; did anyone ever create something
>> other than poorly tested, partial W3C standards, barely usable module
>> to handle DTD or Schemas? I have been search CPAN and tested modules
>> franticly for months without finding anything that is good (other than
>> lidxml2 stuff looks good).
> I've said as much before on this group but, .....
>
> Amongst the many things I've happily & successfully done with Perl &
> CPAN modules in the last 15 years or so, parsing XML is not one.

I've used both XML::Parser and XML::LibXML successfully (often
indirectly through XML::Simple). Yes, both rely on a C library to do the
actual parsing, but I don't see that as a big problem (I need
other external libraries anyway, especially database stuff, so I
couldn't run a "pure perl" environment anyway. And from a programmer's
perspective there is no difference).


> Perl SHOULD be good at this sort of thing (and as others have
> suggested, the Perl community would love some help on this, I'm sure)
> but issues w/ recursion and lack of an adequate threading model seem
> to put it at a disadvantage vis a vis Java, in my opinion, for tasks
> like this,

I don't think perl has "issues w/ recursion". I've certainly written
some deeply recursive stuff (so I know about "no warnings 'recursion'").
Function calls may be slower in perl than in Java, and they are
certainly a lot slower than in C.

Threading in Perl is broken IMHO. But I don't see what this has to do
with parsing. That task doesn't seem to be parallelizable to me.

> while one of Perl's great strengths over Java (built-in regular
> expression matching) are somewhat wasted on data as structured as XML.

True. XML is designed to be parsed a character at a time. You can do
that in perl of course, but you have all that overhead of interpreting a
bytecode which makes things slow. So you want to do that in a language
which is designed for dealing with data a byte at a time, like C. (Java
is somewhere in between).


> I've had to write my own lexers & parsers for even the lightest of XML
> tasks in Perl.

If you wrote your own lexers & parsers in Perl, Perl is obviously good
enough for you for parsing XML.

hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"

Re: Help with validating XML (DTD or Schema) with PERL

am 02.07.2007 06:19:52 von keith

On Jul 1, 5:36 am, "Peter J. Holzer" wrote:
> On 2007-06-28 13:53, Keith wrote:
>
>
>
> > On Jun 25, 5:06 pm, 1234mar...@gmail.com wrote:
> >> Can someone tell me if a reliable production PERL package was ever
> >> created to handle either W3C DTD or Schema standards? I realize
> >> libxml2 based packages are very good, but I am not willing to tackle
> >> the maintenance required for compiled libxml2 program in our
> >> production environment.
>
> >> It is the year 2007 for heavens sake; did anyone ever create something
> >> other than poorly tested, partial W3C standards, barely usable module
> >> to handle DTD or Schemas? I have been search CPAN and tested modules
> >> franticly for months without finding anything that is good (other than
> >> lidxml2 stuff looks good).
> > I've said as much before on this group but, .....
>
> > Amongst the many things I've happily & successfully done with Perl &
> > CPAN modules in the last 15 years or so, parsing XML is not one.
>
> I've used both XML::Parser and XML::LibXML successfully (often
> indirectly through XML::Simple). Yes, both rely on a C library to do the
> actual parsing, but I don't see that as a big problem (I need
> other external libraries anyway, especially database stuff, so I
> couldn't run a "pure perl" environment anyway. And from a programmer's
> perspective there is no difference).
>
> > Perl SHOULD be good at this sort of thing (and as others have
> > suggested, the Perl community would love some help on this, I'm sure)
> > but issues w/ recursion and lack of an adequate threading model seem
> > to put it at a disadvantage vis a vis Java, in my opinion, for tasks
> > like this,
>
> I don't think perl has "issues w/ recursion". I've certainly written
> some deeply recursive stuff (so I know about "no warnings 'recursion'").
> Function calls may be slower in perl than in Java, and they are
> certainly a lot slower than in C.
>
> Threading in Perl is broken IMHO. But I don't see what this has to do
> with parsing. That task doesn't seem to be parallelizable to me.

I should have been clearer. One thing is a recursive subroutine,
another
is a recursive Perl script. I've had issues w/ recursive Perl
scripts,
but for parsing XML you're not likely to need these. As for
subroutines...

Because by default everything in Perl is global to the package, this
can create
issues for recursive code. Someone who takes a very disciplined
approach
to writing OO Perl might not have the same issues; but writing
recursive Perl subs usually
requires me to go back and cleanup after myself where I've taken short
cuts to get work
done in a hurry. Of course, languages which don't allow such
shortcuts to begin with
(e.g.: Java), force you to pay this price up front, whether you need
to pay it or not.
Your point is well-taken.

To the point below; again I was not clear. A lot of the work I need
to do in XML involves
parsing a master XML document (SAX - style) with pointers to other XML
documents.
The parallelization here should be obvious: in Java fooParser
instantiates one or
more barParsers, each of which can run in it's own thread. It's
likely that not everyone's
XML needs match that pattern, and might not benefit as much from a
robust threading model.

>
> > while one of Perl's great strengths over Java (built-in regular
> > expression matching) are somewhat wasted on data as structured as XML.
>
> True. XML is designed to be parsed a character at a time. You can do
> that in perl of course, but you have all that overhead of interpreting a
> bytecode which makes things slow. So you want to do that in a language
> which is designed for dealing with data a byte at a time, like C. (Java
> is somewhere in between).
>
> > I've had to write my own lexers & parsers for even the lightest of XML
> > tasks in Perl.
>
> If you wrote your own lexers & parsers in Perl, Perl is obviously good
> enough for you for parsing XML.
>
> hp
>
> --
> _ | Peter J. Holzer | I know I'd be respectful of a pirate
> |_|_) | Sysadmin WSR | with an emu on his shoulder.
> | | | h...@hjp.at |
> __/ |http://www.hjp.at/| -- Sam in "Freefall"

Re: Help with validating XML (DTD or Schema) with PERL

am 02.07.2007 08:11:19 von hjp-usenet2

On 2007-07-02 04:19, Keith wrote:
> On Jul 1, 5:36 am, "Peter J. Holzer" wrote:
>> On 2007-06-28 13:53, Keith wrote:
>> > Amongst the many things I've happily & successfully done with Perl &
>> > CPAN modules in the last 15 years or so, parsing XML is not one.
>>
>> I've used both XML::Parser and XML::LibXML successfully (often
>> indirectly through XML::Simple). Yes, both rely on a C library to do the
>> actual parsing, but I don't see that as a big problem (I need
>> other external libraries anyway, especially database stuff, so I
>> couldn't run a "pure perl" environment anyway. And from a programmer's
>> perspective there is no difference).
>>
>> > Perl SHOULD be good at this sort of thing (and as others have
>> > suggested, the Perl community would love some help on this, I'm sure)
>> > but issues w/ recursion and lack of an adequate threading model seem
>> > to put it at a disadvantage vis a vis Java, in my opinion, for tasks
>> > like this,
>>
>> I don't think perl has "issues w/ recursion". I've certainly written
>> some deeply recursive stuff (so I know about "no warnings 'recursion'").
>> Function calls may be slower in perl than in Java, and they are
>> certainly a lot slower than in C.
>>
> I should have been clearer. One thing is a recursive subroutine,
> another
> is a recursive Perl script. I've had issues w/ recursive Perl
> scripts,

[please fix the line width in your newsreader: Alternating long and
short lines isn't pretty. I've reformatted the rest of your posting]

What kind of issues? A perl script is just a program like any other.
Whether you call a perl script, compiled C program, java program or
anything else from a perl script, compiled C program, java program or
anything else doesn't matter (on Unix anyway, except that you can't
directly invoke java programs but usually need a wrapper shellscript).

> but for parsing XML you're not likely to need these. As for
> subroutines...
>
> Because by default everything in Perl is global to the package, this
> can create issues for recursive code.

It has long been best practice to "use strict" in perl code. Then there
is no default as you have to declare every variable explicitely. Of
course you can still declare variables outside of any sub, but that's a
concious decision of the programmer (just like a Java programmer needs
to decide whether a variable should be local to the method, the
instance, or the class).

>> Threading in Perl is broken IMHO. But I don't see what this has to do
>> with parsing. That task doesn't seem to be parallelizable to me.
>
> To the point below; again I was not clear. A lot of the work I need
> to do in XML involves parsing a master XML document (SAX - style) with
> pointers to other XML documents. The parallelization here should be
> obvious: in Java fooParser instantiates one or more barParsers, each
> of which can run in it's own thread.

Which assumes that parsing the main document can continue before the
referenced document is parsed. But yes, if this is true, then
parallelization may be a win. In (non-threaded) perl you could probably
do that with a select-loop (which uses only a single CPU but works well
if you read from network resources) or by forking off children which use
Storable to pass the result back to the parent.

hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"