SJIS handling bug in Connector/J 3.1.0 alpha

am 04.05.2003 03:28:00 von Naoto Sato

--Boundary_(ID_lW8IFIjPko89daZntrI8vw)
Content-type: text/plain; charset=ISO-2022-JP
Content-transfer-encoding: 7BIT

Hi,

I just subscribed to this bug list, so please forgive me if I am doing
something wrong.

I found a problem in the Connector/J 3.1.0 JDBC driver when dealing with
Shift JIS encoding. The method in question is
com.mysql.jdbc.StringUtils.escapeSJISByteStream().

Here is the evaluation of the problem. The method tries to escape 0x5c
if it is found as the high byte in a double byte character. The problem
is that it mistakenly escapes 0x5c in some cases. For example, let's
take a look at the byte sequence "0x82, 0xf0, 0x5c". This consists of
two characters, i.e., the first and the second byte composes a Japanese
double byte character, and of course the third byte is a back slash. The
current logic thinks that "0xf0, 0x5c" as one double byte character
which has 0x5c as the trailing byte, so it inserts an additional 0x5c,
which is a problem.

I attached a proposed fix to this problem, so I'd appreciate it if
someone would take a look at it.

Also, I corrected the low byte range for the double byte character in
the source code. The beginning low byte is not 0x80, but 0x81.

Thanks,

--
Naoto Sato

--Boundary_(ID_lW8IFIjPko89daZntrI8vw)
Content-type: text/plain; name=StringUtils.java.patch
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=StringUtils.java.patch

*** mysql-connector-java-3.1.0-alpha/com/mysql/jdbc/StringUtils. java.orig Sat May 3 17:23:54 2003
--- mysql-connector-java-3.1.0-alpha/com/mysql/jdbc/StringUtils. java Sat May 3 18:00:16 2003
***************
*** 163,175 ****

//
// The codepage characters in question exist between
! // 0x80-0x9F and 0xE0-0xFC...
//
// See:
//
// http://www.microsoft.com/GLOBALDEV/Reference/dbcs/932.htm
//
! if (((loByte >= 0x80) && (loByte <= 0x9F))
|| ((loByte >= 0xE0) && (loByte <= 0xFC))) {
if (bufIndex < (stringLen - 1)) {
int hiByte = (int) origBytes[bufIndex + 1];
--- 163,175 ----

//
// The codepage characters in question exist between
! // 0x81-0x9F and 0xE0-0xFC...
//
// See:
//
// http://www.microsoft.com/GLOBALDEV/Reference/dbcs/932.htm
//
! if (((loByte >= 0x81) && (loByte <= 0x9F))
|| ((loByte >= 0xE0) && (loByte <= 0xFC))) {
if (bufIndex < (stringLen - 1)) {
int hiByte = (int) origBytes[bufIndex + 1];
***************
*** 178,189 ****
hiByte += 256; // adjust for signedness/wrap-around
}

! //
! // Here's the problematic critter...
! //
! // we write it out, and it gets written
! // again at the top of the loop, thus
! // escaping it.
if (hiByte == 0x5C) {
bytesOut.write(hiByte);
}
--- 178,189 ----
hiByte += 256; // adjust for signedness/wrap-around
}

! // write the high byte here, and increment the index
! // for the high byte
! bytesOut.write(hiByte);
! bufIndex++;
!
! // escape 0x5c if necessary
if (hiByte == 0x5C) {
bytesOut.write(hiByte);
}

--Boundary_(ID_lW8IFIjPko89daZntrI8vw)
Content-Type: text/plain; charset=us-ascii

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org
--Boundary_(ID_lW8IFIjPko89daZntrI8vw)--