Length of characters columns with multibyte encodings.

Length of characters columns with multibyte encodings.

am 23.07.2003 09:53:14 von Bertrand Lanneau

------=_NextPart_000_0007_01C35100.3C359420
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Hello,

I have a problem with the way mysql seems to manage the
length of characters-columns with multi-byte encodings.
I'm using the 4.1.0 version of MySQL.

For example, if i define a column of type varchar(10) and
of encoding utf8, i can put in the column 10 one-byte characters,
but only 5 2-bytes characters. Mysql seems to count the number
of bytes of the representation, not the number of characters.

The problem can be reproduced with the following lines :=20
create table table1(col1 varchar(10) character set latin1, col2 =
varchar(10)=20
character set utf8, col3 varchar(10) character set ucs2);
insert into table1 values("abcdefghij", "abcdefghij", "abcdefghij");=20
insert into table1 values("éééééééééé", =
"éééééééééé", "éééééééééé");
select * from table1;
+------------+------------+-------+
| col1 | col2 | col3 |
+------------+------------+-------+
| abcdefghij | abcdefghij | abcde |
| éééééééééé | éééé=E9 | =
éééé=E9 |
+------------+------------+-------+


I search the MySQL manual and the web for information about this,
but found no information. I also found nothing in the bugs reports=20
of MySQL.
Does someone have information about this ?=20
This clearly seems as a bug for me, but is it condidered as a bug by=20
MySQL and will this be fixed soon ? Or will we have to manage with this
for a while ?

Thank you in advance for a response,

Bertrand Lanneau.
------=_NextPart_000_0007_01C35100.3C359420--

Re: Length of characters columns with multibyte encodings.

am 23.07.2003 10:31:32 von Sergei Golubchik

Hi!

On Jul 23, Bertrand Lanneau wrote:
>
> Hello,
>
> I have a problem with the way mysql seems to manage the
> length of characters-columns with multi-byte encodings.
> I'm using the 4.1.0 version of MySQL.
>
> For example, if i define a column of type varchar(10) and
> of encoding utf8, i can put in the column 10 one-byte characters,
> but only 5 2-bytes characters. Mysql seems to count the number
> of bytes of the representation, not the number of characters.
>
> I search the MySQL manual and the web for information about this,
> but found no information. I also found nothing in the bugs reports
> of MySQL.
> Does someone have information about this ?
> This clearly seems as a bug for me, but is it condidered as a bug by
> MySQL and will this be fixed soon ? Or will we have to manage with this
> for a while ?

It's not really a bug, that is not a programming error.
It's not implemented feature.

For years MySQL was using number of bytes as a limit in varchar(10) -
full unicode support is being added, but the above assumption is all
throughout the code, and it's not that easy to change.
Note, it's explanation only why it's not done yet, I'm not saying it
won't be done. Of course, this WILL be done. And probably in 4.1.1 or
4.1.2

Regards,
Sergei

--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/ www.mysql.com

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org