Multibyte character?

am 16.04.2008 06:20:11 von vijay

When I do a readfile or file_get_contents on a web page the string I
get back get corrupted for non-ASCII characters. For instance when do
a readfile("http://abc/def") "S=E3o Paulo" became "SÃ£o Paulo" on the
calling page although http://abc/def shows "S=E3o Paulo" correctly. Any
idea on how to fix this problem.

Let me try to explain it more. I have two pages http://abc/def,
http://abc/ghi.php and I am trying to read the contents of http://abc/def
from http://abc/ghi.php.

Re: Multibyte character?

am 16.04.2008 09:31:57 von Willem Bogaerts

vijay wrote:
> When I do a readfile or file_get_contents on a web page the string I
> get back get corrupted for non-ASCII characters. For instance when do
> a readfile("http://abc/def") "São Paulo" became "SÃ£o Paulo" on the
> calling page although http://abc/def shows "São Paulo" correctly. Any
> idea on how to fix this problem.
>
> Let me try to explain it more. I have two pages http://abc/def,
> http://abc/ghi.php and I am trying to read the contents of http://abc/def
> from http://abc/ghi.php.
What you get is exactly right. From your example, it appears that your
text is utf-8 encoded and that the second page is (probably) latin-1
encoded. A "readfile" without respecting any encodings is not enough to
display "human" text.

If you use curl, you can catch the headers that contain the encoding
used and use mbstring to convert it. Or if it is always the same page
you read, you know the encoding beforehand.

Best regards.
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/