bug in unicode UCA collations with LIKE comparisons and INDEX

bug in unicode UCA collations with LIKE comparisons and INDEX

am 30.08.2004 23:22:30 von Jeremy March

I've found two bugs which appear to be related when performing a LIKE
comparison using the unicode uca collations. For the bug to occur there
must be more than one row in the table being selected and there must be
an index on the column. With utf8_unicode_ci columns this causes mysqld
to hang and with ucs2_unicode_ci columns mysql doesn't find the rows
when it should. Here are 3 tests. I performed them on MySQL 4.1.4 with
the most recent bitkeeper pulls. I used Red Hat linux 9.0 on a pentium
III processor.

1) with utf8_general_ci which works as expected.

2) with ucs2_unicode_ci which doesn't find any rows when it should find
2.

3) with utf8_unicode_ci which causes mysqld to hang.

When these tests are done without an index they work fine.

best regards,
Jeremy March

#test 1
DROP TABLE IF EXISTS t;

CREATE TABLE t (c varchar(255) NOT NULL COLLATE utf8_general_ci, INDEX
(c));

INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B403B11F770308 USING
utf8));

SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
COLLATE utf8_general_ci;
#ok with just one row

INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B4 USING utf8));

SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
COLLATE utf8_general_ci;
#works fine with utf8_general_ci

DROP TABLE t;

#test2
DROP TABLE IF EXISTS t;

CREATE TABLE t (c varchar(255) NOT NULL COLLATE ucs2_unicode_ci, INDEX
(c));

INSERT INTO t VALUES (_ucs2 0x039C03C903B403B11F770308);

SELECT * FROM t WHERE c LIKE _ucs2 0x039C0025 COLLATE ucs2_unicode_ci;
#ok with just one row

INSERT INTO t VALUES (_ucs2 0x039C03C903B4);

SELECT * FROM t WHERE c LIKE _ucs2 0x039C0025 COLLATE ucs2_unicode_ci;
#doesn't find anything?

DROP TABLE t;

#test 3
DROP TABLE IF EXISTS t;

CREATE TABLE t (c varchar(255) NOT NULL COLLATE utf8_unicode_ci, INDEX
(c));

INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B403B11F770308 USING
utf8));

SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
COLLATE utf8_unicode_ci;
#ok with just one row

INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B4 USING utf8));

SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
COLLATE utf8_unicode_ci;
#mysqld hangs here

DROP TABLE t;


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: bug in unicode UCA collations with LIKE comparisons and INDEX

am 31.08.2004 20:21:40 von Alexander Barkov

Hi Jeremy,

Looks like a bug. Thank you very much for reporting.
I added your report into our bug system, and will
fix it asap:

http://bugs.mysql.com/bug.php?id=5324

Regards!

Jeremy March wrote:

> I've found two bugs which appear to be related when performing a LIKE
> comparison using the unicode uca collations. For the bug to occur there
> must be more than one row in the table being selected and there must be
> an index on the column. With utf8_unicode_ci columns this causes mysqld
> to hang and with ucs2_unicode_ci columns mysql doesn't find the rows
> when it should. Here are 3 tests. I performed them on MySQL 4.1.4 with
> the most recent bitkeeper pulls. I used Red Hat linux 9.0 on a pentium
> III processor.
>
> 1) with utf8_general_ci which works as expected.
>
> 2) with ucs2_unicode_ci which doesn't find any rows when it should find
> 2.
>
> 3) with utf8_unicode_ci which causes mysqld to hang.
>
> When these tests are done without an index they work fine.
>
> best regards,
> Jeremy March
>
> #test 1
> DROP TABLE IF EXISTS t;
>
> CREATE TABLE t (c varchar(255) NOT NULL COLLATE utf8_general_ci, INDEX
> (c));
>
> INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B403B11F770308 USING
> utf8));
>
> SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
> COLLATE utf8_general_ci;
> #ok with just one row
>
> INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B4 USING utf8));
>
> SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
> COLLATE utf8_general_ci;
> #works fine with utf8_general_ci
>
> DROP TABLE t;
>
> #test2
> DROP TABLE IF EXISTS t;
>
> CREATE TABLE t (c varchar(255) NOT NULL COLLATE ucs2_unicode_ci, INDEX
> (c));
>
> INSERT INTO t VALUES (_ucs2 0x039C03C903B403B11F770308);
>
> SELECT * FROM t WHERE c LIKE _ucs2 0x039C0025 COLLATE ucs2_unicode_ci;
> #ok with just one row
>
> INSERT INTO t VALUES (_ucs2 0x039C03C903B4);
>
> SELECT * FROM t WHERE c LIKE _ucs2 0x039C0025 COLLATE ucs2_unicode_ci;
> #doesn't find anything?
>
> DROP TABLE t;
>
> #test 3
> DROP TABLE IF EXISTS t;
>
> CREATE TABLE t (c varchar(255) NOT NULL COLLATE utf8_unicode_ci, INDEX
> (c));
>
> INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B403B11F770308 USING
> utf8));
>
> SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
> COLLATE utf8_unicode_ci;
> #ok with just one row
>
> INSERT INTO t VALUES (CONVERT(_ucs2 0x039C03C903B4 USING utf8));
>
> SELECT * FROM t WHERE c LIKE CONVERT(_ucs2 0x039C0025 USING utf8)
> COLLATE utf8_unicode_ci;
> #mysqld hangs here
>
> DROP TABLE t;


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org