non-utf characters and XML

non-utf characters and XML

am 08.11.2007 07:02:47 von Tgone

Hello,

My problem:
I'm using PHP to dynamically create an XML document. However, some of
my data (from MySQL) contains non-UTF characters such as the umlaut.
Naturally, browsers like IE 7 throw an error when attempting to parse
these characters. I understand that these characters are invalid for
XML.

My question:
What is the best to handle these characters when creating XML
documents on the fly? It seems like searching and replacing these
characters would be complicated, and there must be an easier way.

Thanks!

Re: non-utf characters and XML

am 08.11.2007 08:10:11 von Martin Mandl - m2m tech support

On Nov 8, 7:02 am, Toine wrote:
> Hello,
>
> My problem:
> I'm using PHP to dynamically create an XML document. However, some of
> my data (from MySQL) contains non-UTF characters such as the umlaut.
> Naturally, browsers like IE 7 throw an error when attempting to parse
> these characters. I understand that these characters are invalid for
> XML.
>
> My question:
> What is the best to handle these characters when creating XML
> documents on the fly? It seems like searching and replacing these
> characters would be complicated, and there must be an easier way.
>
> Thanks!

Actually Umlauts are in UTF-8. But you should tell your browser which
character set you are using.
You could do that in the xml header, e.g.


or set it in the header using php, e.g.
header('content-type: text/html; charset=utf-8');

which is basically the same as the meta tag


or let .htaccess do the job, e.g.
AddCharset utf-8 .css .html .xhtml .xml .php

good luck
Martin


------------------------------------------------
online accounting on bash bases
Online Einnahmen-Ausgaben-Rechnung
http://www.ea-geier.at
------------------------------------------------
m2m server software gmbh
http://www.m2m.at

Re: non-utf characters and XML

am 08.11.2007 08:29:54 von luiheidsgoeroe

On Thu, 08 Nov 2007 08:10:11 +0100, Martin Mandl - m2m tech support =

wrote:

> On Nov 8, 7:02 am, Toine wrote:
>> Hello,
>>
>> My problem:
>> I'm using PHP to dynamically create an XML document. However, some of=

>> my data (from MySQL) contains non-UTF characters such as the umlaut.
>> Naturally, browsers like IE 7 throw an error when attempting to parse=

>> these characters. I understand that these characters are invalid for
>> XML.
>>
>> My question:
>> What is the best to handle these characters when creating XML
>> documents on the fly? It seems like searching and replacing these
>> characters would be complicated, and there must be an easier way.
>>
>> Thanks!
>
> Actually Umlauts are in UTF-8. But you should tell your browser which
> character set you are using.

Indeed. When using UTF-8, avoid a BOM btw.

> You could do that in the xml header, e.g.
>
>
> or set it in the header using php, e.g.
> header('content-type: text/html; charset=3Dutf-8');

Do serve XML as XML though, it isn't HTML.
-- =

Rik Wasmus

Re: non-utf characters and XML

am 09.11.2007 08:29:16 von Jake Barnes

On Nov 8, 1:02 am, Toine wrote:
> Hello,
>
> My problem:
> I'm using PHP to dynamically create an XML document. However, some of
> my data (from MySQL) contains non-UTF characters such as the umlaut.
> Naturally, browsers like IE 7 throw an error when attempting to parse
> these characters. I understand that these characters are invalid for
> XML.
>
> My question:
> What is the best to handle these characters when creating XML
> documents on the fly? It seems like searching and replacing these
> characters would be complicated, and there must be an easier way.

If you're only trying to communicate plain text, you can wrap your
text in a CDATA block. Or you can do a lot of str_replace() to change
them all to HTML entities.

If the problem is that your XML is outputting things that your users
input, and your users are inputting a lot of junk, then all you can do
is filter out the non-UTF8 stuff. seems_utf8 can be a help, and is
mentioned on this page:

http://wordpress.taragana.net/nav.html?_functions/index.html