Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 03:05:34 von joshbeall
Hi All,
Consider the following test code:
$xml = "";
$parser = xml_parser_create();
$result = xml_parse($parser,$xml);
$errorcode = xml_get_error_code($parser);
$errormsg = xml_error_string($errorcode);
$ln = xml_get_current_line_number($parser);
$cn = xml_get_current_column_number($parser);
var_dump($result);
echo "Error parsing XML document, '$errormsg' : Line $ln, Column $cn";
This will output:
int(0)
Error parsing XML document, 'Invalid character' : Line 1, Column 12
A return code of int(0) indicates failure. If you replace with
, it works with no error.
I don't understand why it's failing on -- isn't it perfectly
valid to include that numeric entity in the text content of an XML
node? Is this a bug? Or am I doing something wrong?
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 03:41:47 von joshbeall
Sorry I neglected to say -- I'm running PHP 5.2.4
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 08:27:43 von John Dunlop
Joshua Beall:
> $xml = "";
The character referred to by that character reference does not match
the production Char.
http://www.w3.org/TR/REC-xml/#charsets
--
Jock
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 13:46:21 von joshbeall
On Nov 11, 2:27 am, John Dunlop wrote:
> Joshua Beall:
>
> > $xml = "";
>
> The character referred to by that character reference does not match
> the production Char.
>
> http://www.w3.org/TR/REC-xml/#charsets
>
> --
> Jock
I'm not sure I completely follow; doesn't the fact that character 0x0B
(decimal 11) is outside the range of acceptable characters simply mean
that you have to encode it as ? If not, then how should I encode
the vertical tab character (character code 0x0B) to put it in an XML
document?
-Josh
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 17:09:49 von John Dunlop
Joshua Beall:
> I'm not sure I completely follow; doesn't the fact that character 0x0B
> (decimal 11) is outside the range of acceptable characters simply mean
> that you have to encode it as ?
No, the character referred to by a character reference must match the
production Char.
> If not, then how should I encode the vertical tab character (character
> code 0x0B) to put it in an XML document?
XML1.1, although the spec discourages using that character.
--
Jock
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 19:54:26 von joshbeall
John Dunlop wrote:
> > If not, then how should I encode the vertical tab character (character
> > code 0x0B) to put it in an XML document?
>
> XML1.1, although the spec discourages using that character.
So there's no way to do this in XML 1.0?
Let me give a more complete description of my circumstance, and if you
have any suggestions on what I should do, I'd be grateful.
I'm trying to parse an XML export of a FileMaker Pro 8 database. FMP
is putting raw vertical tabs in the output at various places.
Before passing the XML document to the PHP SAX parser, I'm checking
through the document for any characters with a character code below 32
and trying to encode them. I didn't realize this was the wrong way to
go about this.
Now I can't change the fact that for some reason FMP is sometimes
going to be spitting out these vertical tab characters; apparently it
internally uses vertical tabs for something.
So is there any way to parse this document? Or the only thing I can
do is strip out any illegal characters that can't be encoded before I
parse it?
-Josh
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 20:24:35 von John Dunlop
Joshua Beall:
> So there's no way to do this in XML 1.0?
No, short of such ad hockery as custom elements or PIs that pass the
character by reference to the application.
--
Jock
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 21:08:28 von joshbeall
On Nov 11, 2:24 pm, John Dunlop wrote:
> Joshua Beall:
>
> > So there's no way to do this in XML 1.0?
>
> No, short of such ad hockery as custom elements or PIs that pass the
> character by reference to the application.
What do you mean by "pass the character by reference"?
I could change the XML prologue before feeding it to the parser,
changing the version to 1.1 -- but could this cause other problems?
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 21:55:13 von John Dunlop
Joshua Beall:
> What do you mean by "pass the character by reference"?
E.g.,
--
Jock
Re: Can"t parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
am 11.11.2007 22:26:19 von joshbeall
On Nov 11, 3:55 pm, John Dunlop wrote:
> Joshua Beall:
>
> > What do you mean by "pass the character by reference"?
>
> E.g.,
Ah, I see -- and then I'd have to parse manually out that value later
on.
What do you think of simply changing the XML prologue to specify XML
1.1?
-Josh