Fun with MySQL collation, HTML charset and PHP utf8_encode

Fun with MySQL collation, HTML charset and PHP utf8_encode

am 07.04.2008 19:42:39 von Dee Ayy

How do I avoid having so much fun using utf8_encode throughout my document?

I was thinking of using output buffering and then making 1 call to
utf8_encode, but I think a better question is, how do I stop using
utf8_encode completely?

The docs say that utf8_encode "Encodes an ISO-8859-1 string to UTF-8".
So if I start with a UTF-8 string, why should I need to use
utf8_encode? Am I really starting with a UTF-8 string if I set MySQL
to utf_unicode_ci for that field, set the content type with
header('Content-type: text/html; charset=3Dutf-8'); and set the HTML
charset with charset=3Dutf-8"> ?

In MySQL 5.0.22, I had a Type text, Collation latin1_swedish_ci field
(default settings, I believe) which I pasted the character "=E9" from
the French Keyboard Viewer on a Mac Leopard machine into phpMyAdmin
2.11.1.2. This is an e with an accent on top (in case it is not
rendered properly in your email client).
Hmm, pulling the phpMyAdmin version reveals:
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf_unicode_ci

I retrieve the field using mysql_fetch_assoc and display it in an HTML
page rendered by PHP with and without
header('Content-type: text/html; charset=3Dutf-8');
and


The document was originally saved in Dreamweaver 8 as a Unicode
Normalization Form: C (Canonical Decompositon, followed by Canonical
Composition) without "Include Unicode Signature (BOM)" -- great more
encoding to worry about in my editor.

The rendered view I see in Firefox 2.0.0.12 is a question mark "?"
where the French character should have appeared. If I use
utf8_encode, the character appears as it should.

I had changed the MySQL Collation to utf8_general_ci and
utf8_unicode_ci and I still have to use utf8_encode to see the
character appear properly.

Luckily I'm on PHP 4.3.10, so I can't see what mb_check_encoding would
report -- if that would even help normally.

Don't you just love Monday fun?

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Fun with MySQL collation, HTML charset and PHP utf8_encode

am 07.04.2008 22:47:09 von Bruno Lustosa

On Mon, Apr 7, 2008 at 2:42 PM, Dee Ayy wrote:
> I was thinking of using output buffering and then making 1 call to
> utf8_encode, but I think a better question is, how do I stop using
> utf8_encode completely?

If all components are using utf-8, you should have no problems with
charsets at all. By all components, I mean:
- Script files in utf-8;
- Database in utf-8;
- Database connection using utf-8;
- Content-type header set to utf-8.
With all these, you're free of charset hell, and can enjoy the beauty
of utf-8 completely without problems.

> The rendered view I see in Firefox 2.0.0.12 is a question mark "?"
> where the French character should have appeared. If I use
> utf8_encode, the character appears as it should.

Question mark means the character is not utf-8. Check where it comes
from. Might be the database or the way you are connecting to it. I
don't know much about mysql, I use postgresql. With it, you just have
to call pg_set_client_encoding() to make the connection in utf-8 mode,
and "create database with encoding='unicode'" to set up a database
using utf-8.

> Luckily I'm on PHP 4.3.10, so I can't see what mb_check_encoding would
> report -- if that would even help normally.

Shouls upgrade to PHP 5. PHP 4 is way out of date, is not getting
updates anymore, and will not even get security bugfixes after august
8th. It's been almost 4 years since PHP 5 was released.

http://www.php.net/archive/2007.php

Check the PHP 4 end of life announcement.

--
Bruno Lustosa
ZCE - Zend Certified Engineer - PHP!
http://www.lustosa.net/

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php