ucs2 LIKE comparison

ucs2 LIKE comparison

am 29.01.2004 04:07:26 von Jeremy March

Hi!
Problem: ucs2 LIKE comparison fails in the following cases:
1. wildcard at beginning of string and strings are equal length (not
counting wildcard)
2. there are more than one wildcard and at least one is in the middle of
the string and they are not adjacent

I tested this with mysql 4.1.1 on red hat linux 9.0 with Athlon
processor.
See the test cases below.

(I'd also like to add a slightly related feature request. The utf8 LIKE
comparison is only case/accent insensitive when comparing the Latin
characters. From the source code it looks like the ucs2 LIKE comparison
"my_wildcmp_ucs2()" could easily be adapted to work for utf8 by changing
"my_ucs2_uni()" to "my_utf8_uni()". I tried but couldn't get it to
compile, though, so I'll leave that to you:) Thanks! )

Test cases for ucs2 LIKE comparison:

Reference:
0x0041 = A
0x0025 = %
0x005F = _

Works fine with one wildcard in middle of string:

mysql> select _ucs2 0x00410041 like _ucs2 0x004100250041;
+--------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x004100250041 |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)

Works fine with two adjacent wildcards in middle of string:

mysql> select _ucs2 0x00410041 like _ucs2 0x0041002500250041;
+------------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x0041002500250041 |
+------------------------------------------------+
| 1 |
+------------------------------------------------+
1 row in set (0.00 sec)

Works fine with one wildcard at end of string:

mysql> select _ucs2 0x00410041 like _ucs2 0x004100410025;
+--------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x004100410025 |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)

Works fine with two wildcards at each end of string:

mysql> select _ucs2 0x00410041 like _ucs2 0x0025004100410025;
+------------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x0025004100410025 |
+------------------------------------------------+
| 1 |
+------------------------------------------------+
1 row in set (0.01 sec)

Works fine with one wildcard at beginning of string and second string is
shorter:

mysql> select _ucs2 0x00410041 like _ucs2 0x00250041;
+----------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x00250041 |
+----------------------------------------+
| 1 |
+----------------------------------------+
1 row in set (0.01 sec)

Fails when one wildcard at beginning of string, but strings are equal
length (not counting wildcard):

mysql> select _ucs2 0x00410041 like _ucs2 0x002500410041;
+--------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x002500410041 |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
1 row in set (0.00 sec)

It works with Latin1:

mysql> select _latin1 'AA' like _latin1 '%AA';
+---------------------------------+
| _latin1 'AA' like _latin1 '%AA' |
+---------------------------------+
| 1 |
+---------------------------------+
1 row in set (0.00 sec)

But it works if wildcard at end:

mysql> select _ucs2 0x00410041 like _ucs2 0x004100410025;
+--------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x004100410025 |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)

Fails with one wildcard in middle and second wildcard at beginning or
end of string:

mysql> select _ucs2 0x00410041 like _ucs2 0x0041002500410025;
+------------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x0041002500410025 |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.00 sec)

mysql> select _latin1 'AA' like _latin1 'A%A%';
+----------------------------------+
| _latin1 'AA' like _latin1 'A%A%' |
+----------------------------------+
| 1 |
+----------------------------------+
1 row in set (0.00 sec)

But this works with '_' in place of characters:

mysql> select _ucs2 0x00410041 like _ucs2 0x005f0025005f0025;
+------------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x005f0025005f0025 |
+------------------------------------------------+
| 1 |
+------------------------------------------------+
1 row in set (0.00 sec)

mysql> select _ucs2 0x00410041 like _ucs2 0x0025004100250041;
+------------------------------------------------+
| _ucs2 0x00410041 like _ucs2 0x0025004100250041 |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.00 sec)

mysql> select _latin1 'AA' like _latin1 '%A%A';
+----------------------------------+
| _latin1 'AA' like _latin1 '%A%A' |
+----------------------------------+
| 1 |
+----------------------------------+
1 row in set (0.00 sec)

Fails with two wildcards not adjacent in middle of string:

mysql> select _ucs2 0x004100410041 like _ucs2 0x00410025004100250041;
+--------------------------------------------------------+
| _ucs2 0x004100410041 like _ucs2 0x00410025004100250041 |
+--------------------------------------------------------+
| 0 |
+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select _latin1 'AAA' like _latin1 'A%A%A';
+------------------------------------+
| _latin1 'AAA' like _latin1 'A%A%A' |
+------------------------------------+
| 1 |
+------------------------------------+
1 row in set (0.00 sec)

mysql> select _ucs2 0x00410041004100410041 like _ucs2
0x00410025004100250041;
+----------------------------------------------------------- -----+
| _ucs2 0x00410041004100410041 like _ucs2 0x00410025004100250041 |
+----------------------------------------------------------- -----+
| 0 |
+----------------------------------------------------------- -----+
1 row in set (0.00 sec)

Best regards,
Jeremy March



--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: ucs2 LIKE comparison

am 03.02.2004 11:58:25 von Sergei Golubchik

Hi!

On Jan 28, Jeremy March wrote:
> Hi!
> Problem: ucs2 LIKE comparison fails in the following cases:
> 1. wildcard at beginning of string and strings are equal length (not
> counting wildcard)
> 2. there are more than one wildcard and at least one is in the middle of
> the string and they are not adjacent
>
> I tested this with mysql 4.1.1 on red hat linux 9.0 with Athlon
> processor.
> See the test cases below.

Entered as http://bugs.mysql.com/?id=2619.
Thank you!

Regards,
Sergei

--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/ www.mysql.com

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org