a lesson in query writing and (maybe) a bug report

a lesson in query writing and (maybe) a bug report

am 27.08.2011 00:00:37 von Dave Dyer

This is a cautionary tale - adding indexes is not always helpful or harmless. I recently added an index to the "players" table to optimize a common query, and as a consequence this other query flipped from innocuous to something that takes infinite time.


select p1.player_name,g.score1,g.time1,g.color1,p2.player_name,g.sc ore2,g.time2,g.color2,g.gamename,gmtdate
from players as p1, players as p2, gamerecord g
where (p1.uid = g.player1 and p2.uid = g.player2)
and (p1.is_robot is null and p2.is_robot is null)
order by gmtdate desc limit 50


"players" is a table with 20,000 records, "gamerecord" is a table with 3.5 million records, with "gmtdate" available as an index. The according to "explain", the query used gmtdate as an index, an excellent choice. When I added an index to "is_robot" on the players table, the query flipped to using it, and switched from a brisk report to an infinite slog.

I realize that selecting an index is an imprecise science, and I that can specify what index to use as a hint, but this particular flip was particularly disastrous. It seems odd that the query optimizer would choose to scan a 3.5 million entry table instead of a 20,000 entry table.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: a lesson in query writing and (maybe) a bug report

am 27.08.2011 00:38:47 von Dan Nelson

In the last episode (Aug 26), Dave Dyer said:
> This is a cautionary tale - adding indexes is not always helpful or
> harmless. I recently added an index to the "players" table to optimize a
> common query, and as a consequence this other query flipped from innocuous
> to something that takes infinite time.
>
> select p1.player_name,g.score1,g.time1,g.color1,p2.player_name,g.sc ore2,g.time2,g.color2,g.gamename,gmtdate
> from players as p1, players as p2, gamerecord g
> where (p1.uid = g.player1 and p2.uid = g.player2)
> and (p1.is_robot is null and p2.is_robot is null)
> order by gmtdate desc limit 50
>
> "players" is a table with 20,000 records, "gamerecord" is a table with 3.5
> million records, with "gmtdate" available as an index. The according to
> "explain", the query used gmtdate as an index, an excellent choice. When
> I added an index to "is_robot" on the players table, the query flipped to
> using it, and switched from a brisk report to an infinite slog.
>
> I realize that selecting an index is an imprecise science, and I that can
> specify what index to use as a hint, but this particular flip was
> particularly disastrous. It seems odd that the query optimizer would
> choose to scan a 3.5 million entry table instead of a 20,000 entry table.

Can you post the EXPLAIN EXTENDED output for your before and after queries?
also, have you recently run an ANALYZE TABLE on the tables?

--
Dan Nelson
dnelson@allantgroup.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Re: a lesson in query writing and (maybe) a bug report

am 27.08.2011 01:28:28 von Dave Dyer

>
>
>Can you post the EXPLAIN EXTENDED output for your before and after queries?
>also, have you recently run an ANALYZE TABLE on the tables?

// before

mysql> explain extended select p1.player_name,g.score1,g.time1,g.color1,p2.player_name,g.sc ore2,g.time2,g.color2,g.gamename,gmtdate
-> from players as p1, players as p2, gamerecord g
-> where (p1.uid = g.player1 and p2.uid = g.player2)
-> and (p1.is_robot is null and p2.is_robot is null)
-> order by gmtdate desc limit 50;
+----+-------------+-------+--------+-----------------+----- ----+---------+----------------+-------+----------+--------- ------------------------------
-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra
|
+----+-------------+-------+--------+-----------------+----- ----+---------+----------------+-------+----------+--------- ------------------------------
-------+
| 1 | SIMPLE | p2 | ALL | uid,uidindex | NULL | NULL | NULL | 28653 | 100.00 | Using where; Using temporary; Using fi
lesort |
| 1 | SIMPLE | g | ref | player2,player1 | player2 | 4 | tan2.p2.uid | 41 | 100.00 |
|
| 1 | SIMPLE | p1 | eq_ref | uid,uidindex | uid | 4 | tan2.g.player1 | 1 | 100.00 | Using where
|
+----+-------------+-------+--------+-----------------+----- ----+---------+----------------+-------+----------+--------- ------------------------------
-------+
3 rows in set, 1 warning (0.00 sec)


// after


mysql> use tantrix_tantrix;
Database changed
mysql> explain extended select p1.player_name,g.score1,g.time1,g.color1,p2.player_name,g.sc ore2,g.time2,g.color2,g.gamename,gmtdate
-> from players as p1, players as p2, gamerecord g
-> where (p1.uid = g.player1 and p2.uid = g.player2)
-> and (p1.is_robot is null and p2.is_robot is null)
-> order by gmtdate desc limit 50;
+----+-------------+-------+--------+----------------------- ---+-------------+---------+---------------------------+---- ---+----------+---------------
-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra
|
+----+-------------+-------+--------+----------------------- ---+-------------+---------+---------------------------+---- ---+----------+---------------
-------------------------------+
| 1 | SIMPLE | p1 | ref | uid,uidindex,robot_index | robot_index | 2 | const | 15292 | 100.00 | Using where; U
sing temporary; Using filesort |
| 1 | SIMPLE | g | ref | player2,player1 | player1 | 4 | tantrix_tantrix.p1.uid | 67 | 100.00 |
|
| 1 | SIMPLE | p2 | eq_ref | uid,uidindex,robot_index | uid | 4 | tantrix_tantrix.g.player2 | 1 | 100.00 | Using where
|
+----+-------------+-------+--------+----------------------- ---+-------------+---------+---------------------------+---- ---+----------+---------------
-------------------------------+
3 rows in set, 1 warning (0.11 sec)

mysql>


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: a lesson in query writing and (maybe) a bug report

am 27.08.2011 10:04:16 von Jigal van Hemert

Hi,

On 27-8-2011 1:28, Dave Dyer wrote:
>>
>> Can you post the EXPLAIN EXTENDED output for your before and after queries?
>> also, have you recently run an ANALYZE TABLE on the tables?

What was the result of ANALYZE TABLE?

What is the engine of the tables involved?

> // before
>
Used keys:

p2.NULL, g.player2, p1.uid

In your original post you wrote: "The according to "explain", the query
used gmtdate as an index, an excellent choice."
The explain output you posted later indicated that this is not the case
(anymore).
gmtdate isn't listed as possible index, so what has changed?

> It seems odd that the query optimizer would choose to scan a 3.5
> million entry table instead of a 20,000 entry table.

Let's see.
Before: 28653 * 41 * 1 rows to consider = 1.1 M rows
After: 15292 * 67 * 1 rows to consider = 1.0 M rows

Conclusion: the query optimizer didn't choose to scan an entire table.
In fact it found a way to have to look at 10% less rows.

For the final order by and limit it would be great to have a (partial)
index to work with.
It's true that planning indexes isn't always an exact science. Generally
speaking the goal is to construct both the query and the indexes in a
way that you rule out as many rows as possible early on in the process.

From your query it becomes evident that you want the latest fifty
matches between two players who both have the status is_robot "null".
Try to create indexes which cover as many of the columns which are
involved in the join, where and order parts, and look at the cardinality
of those indexes. This will determine how many records can be discarded
in each join and keeps the number of records MySQL has to scan as low as
possible.

Another way is a bit tricky, but can speed up queries a lot: you want
the 50 most recent records, so analyse the data and see if you can
predict how big your result set will be in a period of time. Let's
assume that there are always between 10 and 50 of such records per day.
If you want the top 50 it would be safe to limit the search for the last
10 to 20 days.
Of course this requires an index which includes gmtdate, but it can make
the result set before the limit a lot smaller.

--
Kind regards / met vriendelijke groet,

Jigal van Hemert.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: a lesson in query writing and (maybe) a bug report

am 27.08.2011 23:21:00 von Jigal van Hemert

Hi,

On 27-8-2011 22:52, Dave Dyer wrote:
> The "innocuous change" was to add an index for "is_robot" which is true
> for 6 out of 20,000 records and null for the rest.

Not useful to add an index for that. I also wonder why the value is null
(meaning: unknown, not certain) for almost all records.

If you want to use such a column in an index it's best to use and index
base on multiple columns. This makes it more useful for use in queries.

> My complaint/question/observation is not how to optimize the query
> that went awry, but to be alarmed that a venerable and perfectly
> serviceable query, written years ago and ignored ever since, suddenly
> brought the system crashing down after making a seemingly innocuous
> change intended to make a marginal improvement on an unrelated query.

Adding an index will most likely trigger some maintenance actions to
make sure the table is healthy before adding the index.
The query optimizer has an extra index to take into account.

> I had previously believed that tinkering the schema by adding
> indexeswas a safe activity.

A database should be left alone for a long period. It needs monitoring
and maintenance. Changes in the schema and even changes in the data can
lead to changes in the behaviour.
You can make suggestions for the indexes to be used and you can even
force the use of an index if the query optimizer makes the wrong
decisions in a case.

--
Kind regards / met vriendelijke groet,

Jigal van Hemert.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: a lesson in query writing and (maybe) a bug report

am 27.08.2011 23:33:50 von Arthur Fuller

--20cf30780c88f82d3e04ab836a04
Content-Type: text/plain; charset=ISO-8859-1

I agree 110%. It is completely pointless to index a column with that amount
of NULLs. In practical fact I would go further: what is the point of a
NULLable column? I try to design my tables such that every column is NOT
NULL. In practice this is not realistic, but I try to adhere to this
principle whenever I can. For example, it's possible to add a new Hire while
not yet having determined which department s/he will work in, and hence
which manager s/he will report to, but typically I deal with such scenarios
by creating an "Undetermined" value in the corresponding lookup table.

Arthur

--20cf30780c88f82d3e04ab836a04--

Re: a lesson in query writing and (maybe) a bug report

am 28.08.2011 00:26:42 von Michael Dykman

--90e6ba6138e8078d4404ab8428e7
Content-Type: text/plain; charset=ISO-8859-1

It is a general rule that indexes for columns with low cardinality are not
worth it, often making queries more expensive than they would be without
said index. binary columns all suffer from this.

- michael dykman


On Sat, Aug 27, 2011 at 4:52 PM, Dave Dyer wrote:

>
> The "innocuous change" was to add an index for "is_robot" which is true
> for 6 out of 20,000 records and null for the rest.
>
> My complaint/question/observation is not how to optimize the query
> that went awry, but to be alarmed that a venerable and perfectly
> serviceable query, written years ago and ignored ever since, suddenly
> brought the system crashing down after making a seemingly innocuous
> change intended to make a marginal improvement on an unrelated query.
>
> I had previously believed that tinkering the schema by adding indexes was a
> safe activity. It's as though I add a shortcut to my regular commute
> and caused a massive traffic jam when the entire traffic flow tried to
> follow me.
>
> (Both tables are "ok" according to analyze table)
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=mdykman@gmail.com
>
>


--
- michael dykman
- mdykman@gmail.com

May the Source be with you.

--90e6ba6138e8078d4404ab8428e7--

Re: a lesson in query writing and (maybe) a bug report

am 28.08.2011 04:08:42 von Shawn Wilson

On Sat, Aug 27, 2011 at 17:33, Arthur Fuller wrote:
> I agree 110%. It is completely pointless to index a column with that amount
> of NULLs. In practical fact I would go further: what is the point of a
> NULLable column? I try to design my tables such that every column is NOT
> NULL. In practice this is not realistic, but I try to adhere to this
> principle whenever I can. For example, it's possible to add a new Hire while
> not yet having determined which department s/he will work in, and hence
> which manager s/he will report to, but typically I deal with such scenarios
> by creating an "Undetermined" value in the corresponding lookup table.
>

maybe this should be a new thread, but...

what's the difference between defining a null value (ie, Undetermined
in your example is the same to you as null)? it would seem that this
would take up more space and take longer to process since null is a
built in (not-)value.

how does null effect an index? i had always assumed that, since there
is nothing there, that record wouldn't go into the index hence
wouldn't be processed when utilizing the index.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: a lesson in query writing and (maybe) a bug report

am 28.08.2011 09:54:41 von Jigal van Hemert

Hi,

On 28-8-2011 4:08, shawn wilson wrote:
> On Sat, Aug 27, 2011 at 17:33, Arthur Fuller wrote:
>> I agree 110%. It is completely pointless to index a column with that amount
>> of NULLs. In practical fact I would go further: what is the point of a
>> NULLable column?

A NULL 'value' is special in most operations. It indicates that the
value is undefined, unknown, uncertain. In this regard it's actually not
a value.
SELECT 'Uncertain' = TRUE;
Result: 0
SELECT 'Uncertain' = FALSE;
Result: 1
SELECT 'Uncertain' = NULL;
Result: NULL

SELECT NULL = TRUE;
Result: NULL
SELECT NULL = FALSE;
Result: NULL
SELECT NULL = NULL;
Result: NULL

(Unfortunately someone decided to add the <=> operator:
SELECT NULL <=> NULL;
Result: 1
Even stranger is that it is documented as "NULL safe" !?!?)

The advantage to me for having NULL 'values' is that it is usually
handled as a truly undefined value. (When you compare an undefined value
with for example 2, the result cannot be TRUE or FALSE. The undefined
value might be equal to 2, or might not be equal to 2. The result can
only be undefined.)
To deal with NULL results inside expressions COALESCE() is a very useful
function.

> how does null effect an index? i had always assumed that, since there
> is nothing there, that record wouldn't go into the index hence
> wouldn't be processed when utilizing the index.

MySQL can use NULL in indexes when executing a query. If there are not
enough different values in a column (low cardinality) it might be faster
to do a full table search instead of first reading the index and then
having to go through the table anyway.

--
Kind regards / met vriendelijke groet,

Jigal van Hemert.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org