Why UTF8 need 24bit in MySQL?
Why UTF8 need 24bit in MySQL?
am 07.06.2010 17:57:44 von Ryan Chan
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
Since MySQL only support BMP, so in fact 16 bit is needed actually?
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: Why UTF8 need 24bit in MySQL?
am 07.06.2010 18:44:09 von Warren Young
On 6/7/2010 9:57 AM, Ryan Chan wrote:
> http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
>
> Since MySQL only support BMP, so in fact 16 bit is needed actually?
I imagine they were thinking they'd extend the support to full Unicode
in the future and didn't want you to have to dump and reload your
databases when that happened. The Unicode consortium has stated that
Unicode will never require more than 21 bits per character[*], and 24
bits is the next even multiple of 8 up from that.
[*] Why 21? Because that's the maximum number of bits you can express
in 4 bytes with UTF-8 encoding. If Unicode were allowed to use all 2^32
code points as originally envisioned, it would require up to 6 bytes per
character in UTF-8 encoding. This promise makes UTF-8 code easier to
write and easier to future-proof without bad performance penalties.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: Why UTF8 need 24bit in MySQL?
am 07.06.2010 19:08:18 von Paul DuBois
On Jun 7, 2010, at 11:44 AM, Warren Young wrote:
> On 6/7/2010 9:57 AM, Ryan Chan wrote:
>> http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
>>=20
>> Since MySQL only support BMP, so in fact 16 bit is needed actually?
>=20
> I imagine they were thinking they'd extend the support to full Unicode =
in the future and didn't want you to have to dump and reload your =
databases when that happened. The Unicode consortium has stated that =
Unicode will never require more than 21 bits per character[*], and 24 =
bits is the next even multiple of 8 up from that.
>=20
> [*] Why 21? Because that's the maximum number of bits you can express =
in 4 bytes with UTF-8 encoding. If Unicode were allowed to use all 2^32 =
code points as originally envisioned, it would require up to 6 bytes per =
character in UTF-8 encoding. This promise makes UTF-8 code easier to =
write and easier to future-proof without bad performance penalties.
Supplemental Unicode characters (4-byte) are supported as of MySQL =
5.5.3:
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgra ding.html
--=20
Paul DuBois
Oracle Corporation / MySQL Documentation Team
Madison, Wisconsin, USA
www.mysql.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg
Re: Why UTF8 need 24bit in MySQL?
am 08.06.2010 14:54:21 von Ryan Chan
Hi,
On Tue, Jun 8, 2010 at 12:44 AM, Warren Young wrote:
> =A0The Unicode consortium has stated that Unicode will
> never require more than 21 bits per character[*], and 24 bits is the next
> even multiple of 8 up from that.
Maybe off topic, but just curious...If 3 bytes is enough for all
Unicode codepoint, then what is the user of 4byte UTF-8 ?
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg