Can"t convert sjis&ujis half-width katakana correctly

Can"t convert sjis&ujis half-width katakana correctly

am 25.03.2004 07:33:58 von Yoshinori.Matsunobu

Hello.

Now I'm evaluating the functionality of Japanese character sets
conversion of MySQL 5.0.0 alpha.

When I inserted Japanese half-width katakana into tables and selected
from them,=20
the following problems occurred (probably bugs).

1. Sjis half-width katakana characters were converted to wrong
characters.
2. Ujis half-width katakana characters were converted to wrong
characters.

Environment:
Version: MySQL 5.0.0 alpha (source code install)
Platform: RedHat Linux 9
Install option :./configure --prefix=3D/usr/local/mysql50
--with-unix-socket-path=3D/usr/local/mysql50/tmp/mysql.sock
--with-extra-charsets=3Dall --with-charset=3Dsjis=20

How-To-Repeat:
1.sjis problem
create database db1 character set utf8;
CREATE TABLE table1(
col1 varchar(100)
) default character set utf8;
=09
set names sjis;
source /path/to/sjis-half-width-katakana-format-file;
select * from table1;

2.ujis problem
truncate table table1;
set names ujis;
source /path/to/ujis-half-width-katakana-format-file;
select * from table1;


Fix:
I read MySQL source code,and I found there are some kinds of bugs.
I fixed the following, tested, and confirmed that all the above problems
were solved(in my environment).

1.ctype-sjis.c
1-1.function my_mb_wc_sjis

current:
.....
if (hi<0x80)
{
pwc[0]=3Dhi;
return 1;
}
if (s+2>e)
return MY_CS_TOOFEW(0);
.....

I fixed:
.....
if (hi<0x80)
{
pwc[0]=3Dhi;
return 1;
}

//I added
if((hi>=3D0xA1)&&(hi<=3D0xDF))
{
pwc[0]=3Dfunc_sjis_uni_onechar(hi);
return 1;
}

if (s+2>e)
return MY_CS_TOOFEW(0);
.....


1-2.function my_mb_wb_sjis

current:
.....
if (!(code=3Dfunc_uni_sjis_onechar(wc)))
return MY_CS_ILUNI;
if (s+2>e)
return MY_CS_TOOSMALL;
=20
s[0]=3Dcode>>8;
s[1]=3Dcode&0xFF;
return 2;
.....

I fixed:
.....
if (!(code=3Dfunc_uni_sjis_onechar(wc)))
return MY_CS_ILUNI;

//I added
if((code>=3D0xA1)&&(code<=3D0xDF))
{
s[0]=3Dcode;
return 1;
}

if (s+2>e)
return MY_CS_TOOSMALL;
=20
s[0]=3Dcode>>8;
s[1]=3Dcode&0xFF;
return 2;
.....



2.ctype-ujis.c =20
function my_wc_mb_euc_jp

current:
.....
ret=3Dmy_wc_mb_jisx0201(c,wc,buf,buf+2);
if (ret==1)
{
if (s+1>e)
return MY_CS_TOOSMALL;
=20
s[0]=3D0x8E;
s[1]=3Dbuf[0];
return 1;
}
.....

I fixed:
.....
ret=3Dmy_wc_mb_jisx0201(c,wc,buf,buf+2);
if (ret==1)
{
if (s+1>e)
return MY_CS_TOOSMALL;
=20
s[0]=3D0x8E;
s[1]=3Dbuf[0];
return 2; //Because ujis half-width katakana is 2 byte.
}
.....


Best Regards,

Matsunobu Yoshinori.

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=3Dgcdmb-bugs@m.gmane.org

Re: Can"t convert sjis&ujis half-width katakana correctly

am 25.03.2004 11:26:33 von Alexander Barkov

Hello, Yoshinori!

You are right, this is a bug, and your fix looks fine.
At least for SJIS (I haven't studied for UJIS yet).

I posted it into our bug system and now working on it.

http://bugs.mysql.com/bug.php?id=3290

Thank you very much for so defailed report.


Matsunobu, Yoshinori wrote:
> Hello.
>
> Now I'm evaluating the functionality of Japanese character sets
> conversion of MySQL 5.0.0 alpha.
>
> When I inserted Japanese half-width katakana into tables and selected
> from them,
> the following problems occurred (probably bugs).
>
> 1. Sjis half-width katakana characters were converted to wrong
> characters.
> 2. Ujis half-width katakana characters were converted to wrong
> characters.
>
> Environment:
> Version: MySQL 5.0.0 alpha (source code install)
> Platform: RedHat Linux 9
> Install option :./configure --prefix=/usr/local/mysql50
> --with-unix-socket-path=/usr/local/mysql50/tmp/mysql.sock
> --with-extra-charsets=all --with-charset=sjis
>
> How-To-Repeat:
> 1.sjis problem
> create database db1 character set utf8;
> CREATE TABLE table1(
> col1 varchar(100)
> ) default character set utf8;
>
> set names sjis;
> source /path/to/sjis-half-width-katakana-format-file;
> select * from table1;
>
> 2.ujis problem
> truncate table table1;
> set names ujis;
> source /path/to/ujis-half-width-katakana-format-file;
> select * from table1;
>
>
> Fix:
> I read MySQL source code,and I found there are some kinds of bugs.
> I fixed the following, tested, and confirmed that all the above problems
> were solved(in my environment).
>
> 1.ctype-sjis.c
> 1-1.function my_mb_wc_sjis
>
> current:
> ....
> if (hi<0x80)
> {
> pwc[0]=hi;
> return 1;
> }
> if (s+2>e)
> return MY_CS_TOOFEW(0);
> ....
>
> I fixed:
> ....
> if (hi<0x80)
> {
> pwc[0]=hi;
> return 1;
> }
>
> //I added
> if((hi>=0xA1)&&(hi<=0xDF))
> {
> pwc[0]=func_sjis_uni_onechar(hi);
> return 1;
> }
>
> if (s+2>e)
> return MY_CS_TOOFEW(0);
> ....
>
>
> 1-2.function my_mb_wb_sjis
>
> current:
> ....
> if (!(code=func_uni_sjis_onechar(wc)))
> return MY_CS_ILUNI;
> if (s+2>e)
> return MY_CS_TOOSMALL;
>
> s[0]=code>>8;
> s[1]=code&0xFF;
> return 2;
> ....
>
> I fixed:
> ....
> if (!(code=func_uni_sjis_onechar(wc)))
> return MY_CS_ILUNI;
>
> //I added
> if((code>=0xA1)&&(code<=0xDF))
> {
> s[0]=code;
> return 1;
> }
>
> if (s+2>e)
> return MY_CS_TOOSMALL;
>
> s[0]=code>>8;
> s[1]=code&0xFF;
> return 2;
> ....
>
>
>
> 2.ctype-ujis.c
> function my_wc_mb_euc_jp
>
> current:
> ....
> ret=my_wc_mb_jisx0201(c,wc,buf,buf+2);
> if (ret==1)
> {
> if (s+1>e)
> return MY_CS_TOOSMALL;
>
> s[0]=0x8E;
> s[1]=buf[0];
> return 1;
> }
> ....
>
> I fixed:
> ....
> ret=my_wc_mb_jisx0201(c,wc,buf,buf+2);
> if (ret==1)
> {
> if (s+1>e)
> return MY_CS_TOOSMALL;
>
> s[0]=0x8E;
> s[1]=buf[0];
> return 2; //Because ujis half-width katakana is 2 byte.
> }
> ....
>
>
> Best Regards,
>
> Matsunobu Yoshinori.
>


--
For technical support contracts, visit https://order.mysql.com/
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Alexander Barkov
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Izhevsk, Russia
<___/ www.mysql.com +7-912-856-80-21


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org