Re: Strange "Â" character output when using simplex

Re: Strange "Â" character output when using simplex

am 07.04.2008 15:19:28 von bizt

On 25 Feb, 10:56, Toby A Inkster
wrote:
> Andy Hassall wrote:
> > bizt wrote:
>
> >> I converting an XML string using simplexml_load_string function. It is
> >> giving me a =C2 character for some reason dotted around the text.
>
> > =A0simplexml always outputs in UTF-8. Is your page's encoding UTF-8?
>
> At a guess, ISO-8859-1 or perhaps ISO-8859-15.
>
> In UTF-8, a "prefix" of an 0xC2 byte is used to access the top half of the=

> "Latin-1 Supplement" block which includes a lot of juicy characters such
> as currency symbols, fractions, superscript 2 and 3, the copyright and
> registered trademark symbols, and the non-breaking space.
>
> However in ISO-8859-1 and -15, the byte 0xC2 represents an =C2, so if UTF-=
8
> is misinterpreted as one of those, then you get =C2 followed by some other=

> nonsense character.
>
> Probably the easiest solution would be to take the output from SimpleXML
> and pass it through iconv():
>
> =A0 =A0 =A0 =A0 $xmlout =3D iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $xmlou=
t);
>
> Note that UTF-8 is capable of representing a far greater range of
> characters than ISO-8859-1/-15 are, so certain characters may not properly=

> survive conversion. (Using the '//TRANSLIT' option tells iconv to do its
> best, and if, say, a particular accented character is not available in
> ISO-8859-1, then to substitute an unaccented one in its place.)
>
> --
> Toby A Inkster BSc (Hons) ARCS
> [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
> [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 26 days, 15:55.]
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Bottled Wat=
er
> =A0 =A0 =A0 =A0 =A0http://tobyinkster.co.uk/blog/2008/02/18/bottled-water/=



Hi, ive tried what you said which worked for one of my pages but when
i tried it on another i got the following:

Notice: iconv() [function.iconv]: Detected an illegal character in
input string in /home/public_html/search_apartments.php on line 67

Im using the following to convert my XML string which is fetched via
cUrl:

$result =3D iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $result);

Would it be the case that my $result string, im not providing the
iconv() with the correct input encoding? If so, is there a way for me
to detect the input encoding?

Cheers

Martyn

Re: Strange "Â" character output when usin

am 08.04.2008 17:00:00 von AnrDaemon

Greetings, bizt.
In reply to Your message dated Monday, April 7, 2008, 17:19:28,

>> >> I converting an XML string using simplexml_load_string function. It is
>> >> giving me a  character for some reason dotted around the text.
>>
>> >  simplexml always outputs in UTF-8. Is your page's encoding UTF-8?
>>
>> At a guess, ISO-8859-1 or perhaps ISO-8859-15.
>>
>> In UTF-8, a "prefix" of an 0xC2 byte is used to access the top half of the
>> "Latin-1 Supplement" block which includes a lot of juicy characters such
>> as currency symbols, fractions, superscript 2 and 3, the copyright and
>> registered trademark symbols, and the non-breaking space.
>>
>> However in ISO-8859-1 and -15, the byte 0xC2 represents an Â, so if UTF-8
>> is misinterpreted as one of those, then you get  followed by some other
>> nonsense character.
>>
>> Probably the easiest solution would be to take the output from SimpleXML
>> and pass it through iconv():
>>
>>         $xmlout = iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $xmlout);
>>
>> Note that UTF-8 is capable of representing a far greater range of
>> characters than ISO-8859-1/-15 are, so certain characters may not properly
>> survive conversion. (Using the '//TRANSLIT' option tells iconv to do its
>> best, and if, say, a particular accented character is not available in
>> ISO-8859-1, then to substitute an unaccented one in its place.)

> Hi, ive tried what you said which worked for one of my pages but when
> i tried it on another i got the following:

> Notice: iconv() [function.iconv]: Detected an illegal character in
> input string in /home/public_html/search_apartments.php on line 67

> Im using the following to convert my XML string which is fetched via
> cUrl:

> $result = iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $result);

> Would it be the case that my $result string, im not providing the
> iconv() with the correct input encoding? If so, is there a way for me
> to detect the input encoding?

As a guess, Your "B" probably followed by space and represent a non-breaking
space.

To Your trouble with iconv on $result, I think You should take care of the
SOURCE BEFORE using simplexml_load_string.
And see what the encoding it use. Because if Your source in, say, ISO-8859-15,
You can't have any untranslatable characters in UTF-8 what You can't convert
back to ISO-8859-15.


--
Sincerely Yours, AnrDaemon