Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 14:06:13 von Philip Stoev

Hello,

For the following table,

CREATE TABLE table1 (
primary1 char(50) NOT NULL default '',
primary2 char(100) NOT NULL default '',
md5 char(32) NOT NULL default '',
date_col datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (primary1, primary2),
KEY date_col (date_col)
) TYPE=MyISAM;

with 3000 records in it, all with date_col = NOW(), freshly inserted and
OPTIMIZE'd, the 200 ivocations of the following query:

SELECT primary1, primary2 FROM table1 WHERE date_col > DATE_SUB(NOW(),
interval 20 minute) ORDER BY md5 limit 1

execute on a 2 Ghz machine for 3.4 seconds. The EXPLAIN is as follows:

+-------------+-------+---------------+--------------+------ ---+------+-----
-+----------------------------+
| table | type | possible_keys | key | key_len | ref | rows
| Extra |
+-------------+-------+---------------+--------------+------ ---+------+-----
-+----------------------------+
| tape_images | range | date_col | date_col | 8 | NULL | 3000
| where used; Using filesort |
+-------------+-------+---------------+--------------+------ ---+------+-----
-+----------------------------+

200 runs of:

SELECT primary1, primary2 FROM table1 IGNORE INDEX (date_col) WHERE date_col
> DATE_SUB(NOW(), interval 20 minute) ORDER BY md5 limit 1

Execute in 1 second or so. The EXPLAIN is:

+-------------+------+---------------+------+---------+----- -+------+-------
---------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra
|
+-------------+------+---------------+------+---------+----- -+------+-------
---------------------+
| tape_images | ALL | date_col | NULL | NULL | NULL | 3000 | where
used; Using filesort |
+-------------+------+---------------+------+---------+----- -+------+-------
---------------------+

If ORDER BY is something else, like RAND() or RAND(md5), the query with
IGNORE INDEX still executes faster. If there is no ORDER BY or no LIMIT 1, t
hen both queries run equally well. I think that there is a LIMIT sort
optimization that fails to kick in if index is used in the WHERE.

The server is 3.23.56-log. Any ideas?

Philip Stoev


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 14:45:02 von Alexander Keremidarski

Hello,

Philip Stoev wrote:
> Hello,
>
> For the following table,



> SELECT primary1, primary2 FROM table1 WHERE date_col > DATE_SUB(NOW(),
> interval 20 minute) ORDER BY md5 limit 1
>
> execute on a 2 Ghz machine for 3.4 seconds. The EXPLAIN is as follows:


> SELECT primary1, primary2 FROM table1 IGNORE INDEX (date_col) WHERE date_col
>
>>DATE_SUB(NOW(), interval 20 minute) ORDER BY md5 limit 1



This list is dedicated to Bugs which can be repeated. Please prepare repeatable
test case which can expose bug on anothet machine. Optimzer problems are often
related to specific data content of you rtables. In case like yours most probably
it is just about wrong index statistics. Refer to
http://www.mysql.com/doc/en/ANALYZE_TABLE.html
http://www.mysql.com/doc/en/OPTIMIZE_TABLE.html

>
> Philip Stoev

Ïîçäðàâè

--
For technical support contracts, visit https://order.mysql.com/?ref=msal
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Alexander Keremidarski
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Sofia, Bulgaria
<___/ www.mysql.com




--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 15:41:26 von Philip Stoev

------=_NextPart_000_001D_01C308EE.0494DFD0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Demonstration code attached.

Philip Stoev


----- Original Message -----
From: "Philip Stoev"
To:
Sent: Tuesday, April 22, 2003 3:06 PM
Subject: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT
1


------=_NextPart_000_001D_01C308EE.0494DFD0
Content-Type: text/plain; charset=us-ascii

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org
------=_NextPart_000_001D_01C308EE.0494DFD0--

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 16:00:29 von Sinisa Milivojevic

Philip Stoev writes:
> Demonstration code attached.
>
> Philip Stoev
>

No attachment arrived.

If you have a fully repeatable test case, upload it to:

ftp://support.mysql.com:/pub/mysql/secret

and let us know a filename.

What we need are the tables and a query that does not work .

--

Regards,

--
For technical support contracts, go to https://order.mysql.com/?ref=msmi
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Sinisa Milivojevic
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer
/_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus
<___/ www.mysql.com


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 16:04:49 von Philip Stoev

Done. Filename is limit.pl.

Philip Stoev

----- Original Message -----
From: "Sinisa Milivojevic"
To:
Cc: ;
Sent: Tuesday, April 22, 2003 5:00 PM
Subject: Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY +
LIMIT 1


> Philip Stoev writes:
> > Demonstration code attached.
> >
> > Philip Stoev
> >
>
> No attachment arrived.
>
> If you have a fully repeatable test case, upload it to:
>
> ftp://support.mysql.com:/pub/mysql/secret
>
> and let us know a filename.
>
> What we need are the tables and a query that does not work .
>
> --
>
> Regards,
>
> --
> For technical support contracts, go to https://order.mysql.com/?ref=msmi
> __ ___ ___ ____ __
> / |/ /_ __/ __/ __ \/ / Mr. Sinisa Milivojevic
> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, FullTime Developer
> /_/ /_/\_, /___/\___\_\___/ Larnaca, Cyprus
> <___/ www.mysql.com
>
>
>


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 20:43:52 von Alexander Keremidarski

Hi,

Philip Stoev wrote:
> Demonstration code attached.
>
> Philip Stoev

Here is result of executing perl script:


/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 17 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 9 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 17 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 3 seconds.


I see no difference with and without IGNORE INDEX

some tweaks.

1. SQL_NO_CACHE added to avoid possible Query_cache hits


/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 17 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 9 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 17 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 3 seconds.


100% same result.

2. Iterations increased 10 time
....
foreach my $row (1..30000) {
....

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 165 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 84 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 165 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 33 seconds.

This time IGNORE INDEX is faster.

Why? Because your script creates rows in a way where All of them match WHERE
clause. In this case full-scan is always faster because it avoids Index porcessing
overhead.

3. Modify script so it WHERE matches some of rows again with 30000 iterations

.... WHERE date_col > DATE_SUB(NOW(), interval 2 second)

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 2 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 2 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 11 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 10 seconds.

Bingo! Optimizer works.

As I said in my first email optimizer's job highly depend on real data and value
distribution of Indexed columns.

Sometimes it will choose optimal solution, sometimes it will fail.


4. Create more random distribution:

Instead of NOW() insert NOW() - INTERVAL 1000*RAND() MINUTE)
30000 rows

And the result is:

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 4 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 2 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 13 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 11 seconds.


Obviously because now your WHERE clause has better selectivity!

Conclusion is: Optimizer can choose best plan in most of cases, but it is always
possible to prepare test case which will catch exceptions when it fails.

No Nugs in this case.

Best regards

--
For technical support contracts, visit https://order.mysql.com/?ref=msal
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Alexander Keremidarski
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Sofia, Bulgaria
<___/ www.mysql.com




--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 22.04.2003 21:19:52 von Philip Stoev

> I see no difference with and without IGNORE INDEX

I respectfully disagree. As per your results:

> /* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 17 seconds.
> /* IGNORE INDEX (date_col) */ + LIMIT 1 = 9 seconds.
> IGNORE INDEX (date_col) + /* LIMIT 1 */ = 17 seconds.
> IGNORE INDEX (date_col) + LIMIT 1 = 3 seconds.

The fourth query (index disabled) is three times faster than the second one
(same with index enabled). Both use the same non-indexed ORDER BY and both
operate on the entire table.

> /* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 17 seconds.
> /* IGNORE INDEX (date_col) */ + LIMIT 1 = 9 seconds.
> IGNORE INDEX (date_col) + /* LIMIT 1 */ = 17 seconds.
> IGNORE INDEX (date_col) + LIMIT 1 = 3 seconds.

> /* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 165 seconds.
> /* IGNORE INDEX (date_col) */ + LIMIT 1 = 84 seconds.
> IGNORE INDEX (date_col) + /* LIMIT 1 */ = 165 seconds.
> IGNORE INDEX (date_col) + LIMIT 1 = 33 seconds.

Same situation. Fourth query in the batch is faster than the second one and
I do not understand why this should be the case.

> This time IGNORE INDEX is faster.
>
> Why? Because your script creates rows in a way where All of them match
WHERE
> clause. In this case full-scan is always faster because it avoids Index
porcessing
> overhead.

No, the extra time it takes to run the query with index is not accumulated
because an index is used or a full table scan is used to obtain a result set
that is identical to the entire table. It appears to me that something
happens within the LIMIT-ed sort, after the result set has been determined.
if you remove the ORDER BY thing, you get:

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 21 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 0 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 20 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 2 seconds.

Now, both the second and the fourth queries are fast because it is easy to
get the first row and return it, with our without an index, when there is no
ORDER BY.

If you remove the WHERE thing and leave the ORDER BY, you get

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 33 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 9 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 35 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 9 seconds.

Queries with IGNORE INDEX run identically to the ones without IGNORE INDEX.

If you remove both the WHERE and the ORDER BY, you get:

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 22 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 2 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 24 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 2 seconds.

Again, queries with IGNORE INDEX run identically to the ones without IGNORE
INDEX.

Now, to go back to our original scenario, both WHERE and ORDER BY enabled:

/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 35 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 18 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 34 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 9 seconds.

Now, the second query is slower than the fourth one, while the first and the
third queries take identical times to run. So, it is not an issue of
preparing the result set -- with our without an index this happens equally
well. It is also not an issue of simply ordering the result set and
retrieving it entirely -- this also happens with identical speed. It is also
not an issue of LIMIT-ing an unordered result set, this is also OK. The
issue is when you have the WHERE, the ORDER BY and the LIMIT 1. If an index
is used, the query repeated 500 times runs for 18 seconds. If the index is
disabled, the same 500 queries run for 9 seconds. If you remove the LIMIT 1,
this difference disappears, even though the indexed column is not the one
that is used for ORDER BY.

While I agree that

> Optimizer can choose best plan in most of cases, but it is always
> possible to prepare test case which will catch exceptions when it fails.

, actually, that is what USE INDEX and IGNORE INDEX are there for, I fail to
see why using an IGNORE INDEX will have such a great impact if and only if
an ORDER BY is being used with a LIMIT 1. I really need to get the faster
performance while still using the index, however ordering and limiting with
an index take longer than ordering and limiting without an index, even if
using the index may speed up things at the WHERE clause.

Philip Stoev




--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Sub-optimal behavoir with keyed WHERE + non-keyed ORDER BY + LIMIT 1

am 23.04.2003 00:19:12 von Alexander Keremidarski

Hi,

Philip Stoev wrote:
>>I see no difference with and without IGNORE INDEX
>
>
> I respectfully disagree. As per your results:
>
>
>>/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 17 seconds.
>>/* IGNORE INDEX (date_col) */ + LIMIT 1 = 9 seconds.
>> IGNORE INDEX (date_col) + /* LIMIT 1 */ = 17 seconds.
>> IGNORE INDEX (date_col) + LIMIT 1 = 3 seconds.
>
>
> The fourth query (index disabled) is three times faster than the second one
> (same with index enabled). Both use the same non-indexed ORDER BY and both
> operate on the entire table.

Yes it is faster. And as I said it is unoptimal decision by optimizer.

Note that in your test case you have WHERE clause which:

1. does range-scan
2. Matches all rows


In second query for where clause which matches all rows MySQL decides to use index
which is always slower in this circumstances and which is proved by your test.

Looking at Index statistics MySQL uses heuristics to evaluate cardinality as "big
enough" so Index scan "probably makes sence".

If you change values to be more random and give chance WHERE clause to match less
rows MySQL decision to use Index become better with decreasing number of matched rows.

Or just reverse > sign to < and you will see completely different picture - IGNORE
INDEX queries will become slower.

Here it is with 30 000 rows:
/* IGNORE INDEX (date_col) */ + /* LIMIT 1 */ = 3 seconds.
/* IGNORE INDEX (date_col) */ + LIMIT 1 = 3 seconds.
IGNORE INDEX (date_col) + /* LIMIT 1 */ = 14 seconds.
IGNORE INDEX (date_col) + LIMIT 1 = 14 seconds.

What all of the above proves is that MySQL optimizer may choose bad plan under
some circumstances depending on actual table content. This is well known fact.


However most important thing in this case is Subject: of your initial email.

Yes this is Unoptimal Optimizer Behaviour. Not Bug!

Best regards.

--
For technical support contracts, visit https://order.mysql.com/?ref=msal
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Alexander Keremidarski
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Sofia, Bulgaria
<___/ www.mysql.com




--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org