Can"t convert sjis&ujis half-width katakana correctly
am 25.03.2004 07:33:58 von Yoshinori.MatsunobuHello.
Now I'm evaluating the functionality of Japanese character sets
conversion of MySQL 5.0.0 alpha.
When I inserted Japanese half-width katakana into tables and selected
from them,=20
the following problems occurred (probably bugs).
1. Sjis half-width katakana characters were converted to wrong
characters.
2. Ujis half-width katakana characters were converted to wrong
characters.
Environment:
Version: MySQL 5.0.0 alpha (source code install)
Platform: RedHat Linux 9
Install option :./configure --prefix=3D/usr/local/mysql50
--with-unix-socket-path=3D/usr/local/mysql50/tmp/mysql.sock
--with-extra-charsets=3Dall --with-charset=3Dsjis=20
How-To-Repeat:
1.sjis problem
create database db1 character set utf8;
CREATE TABLE table1(
col1 varchar(100)
) default character set utf8;
=09
set names sjis;
source /path/to/sjis-half-width-katakana-format-file;
select * from table1;
2.ujis problem
truncate table table1;
set names ujis;
source /path/to/ujis-half-width-katakana-format-file;
select * from table1;
Fix:
I read MySQL source code,and I found there are some kinds of bugs.
I fixed the following, tested, and confirmed that all the above problems
were solved(in my environment).
1.ctype-sjis.c
1-1.function my_mb_wc_sjis
current:
.....
if (hi<0x80)
{
pwc[0]=3Dhi;
return 1;
}
if (s+2>e)
return MY_CS_TOOFEW(0);
.....
I fixed:
.....
if (hi<0x80)
{
pwc[0]=3Dhi;
return 1;
}
//I added
if((hi>=3D0xA1)&&(hi<=3D0xDF))
{
pwc[0]=3Dfunc_sjis_uni_onechar(hi);
return 1;
}
if (s+2>e)
return MY_CS_TOOFEW(0);
.....
1-2.function my_mb_wb_sjis
current:
.....
if (!(code=3Dfunc_uni_sjis_onechar(wc)))
return MY_CS_ILUNI;
if (s+2>e)
return MY_CS_TOOSMALL;
=20
s[0]=3Dcode>>8;
s[1]=3Dcode&0xFF;
return 2;
.....
I fixed:
.....
if (!(code=3Dfunc_uni_sjis_onechar(wc)))
return MY_CS_ILUNI;
//I added
if((code>=3D0xA1)&&(code<=3D0xDF))
{
s[0]=3Dcode;
return 1;
}
if (s+2>e)
return MY_CS_TOOSMALL;
=20
s[0]=3Dcode>>8;
s[1]=3Dcode&0xFF;
return 2;
.....
2.ctype-ujis.c =20
function my_wc_mb_euc_jp
current:
.....
ret=3Dmy_wc_mb_jisx0201(c,wc,buf,buf+2);
if (ret==1)
{
if (s+1>e)
return MY_CS_TOOSMALL;
=20
s[0]=3D0x8E;
s[1]=3Dbuf[0];
return 1;
}
.....
I fixed:
.....
ret=3Dmy_wc_mb_jisx0201(c,wc,buf,buf+2);
if (ret==1)
{
if (s+1>e)
return MY_CS_TOOSMALL;
=20
s[0]=3D0x8E;
s[1]=3Dbuf[0];
return 2; //Because ujis half-width katakana is 2 byte.
}
.....
Best Regards,
Matsunobu Yoshinori.
--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=3Dgcdmb-bugs@m.gmane.org