mysql query optimization

mysql query optimization

am 15.12.2005 15:21:35 von anzenews

Hi all!

I stumbled across this query that I just can't optimize - even if I create
all possible indexes it doesn't use them. Could someone please clue me in?
I have no idea why MySQL doesn't use the indexes. :(

The query in question is:

mysql> explain select count(a0.count) from srchIds ids, srchWords a0 where
a0.word LIKE '%abc%' and a0.id=ids.id group by ids.category;
+----+-------------+-------+------+---------------+------+-- -------+----------------+------+---------------------------- -----+
| id | select_type | table | type | possible_keys | key | key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+------+-- -------+----------------+------+---------------------------- -----+
| 1 | SIMPLE | ids | ALL | PRIMARY | NULL | NULL | NULL
| 330 | Using temporary; Using filesort |
| 1 | SIMPLE | a0 | ref | id | id | 5 |
monitor.ids.id | 144 | Using where |
+----+-------------+-------+------+---------------+------+-- -------+----------------+------+---------------------------- -----+
2 rows in set (0.00 sec)

What is interesting is that it doesn't consider the key on field 'category':

mysql> show keys from srchIds;
+---------+------------+-------------+--------------+------- ------+-----------+-------------+----------+--------+------+ ------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-------------+--------------+------- ------+-----------+-------------+----------+--------+------+ ------------+---------+
| srchIds | 0 | PRIMARY | 1 | id | A
| 330 | NULL | NULL | | BTREE | |
| srchIds | 1 | category | 1 | category | A
| 7 | NULL | NULL | YES | BTREE | |
+---------+------------+-------------+--------------+------- ------+-----------+-------------+----------+--------+------+ ------------+---------+
7 rows in set (0.00 sec)

Any help would be appreciated - I'm lost here.

Thank you!

Anze

Re: mysql query optimization

am 15.12.2005 16:26:59 von Stefan Rybacki

Anze wrote:
> Hi all!
>
> I stumbled across this query that I just can't optimize - even if I create
> all possible indexes it doesn't use them. Could someone please clue me in?
> I have no idea why MySQL doesn't use the indexes. :(
>
> The query in question is:
>
> mysql> explain select count(a0.count) from srchIds ids, srchWords a0 where
> a0.word LIKE '%abc%' and a0.id=ids.id group by ids.category;

The problem is your like clause, because you use leading % mysql isn't able to
use an index for it. To speed this up you could try to use the fulltext
functions and the fulltext index of mysql but that isn't sure to help you.

>
> Thank you!

Regards
Stefan

>
> Anze

Re: mysql query optimization

am 15.12.2005 17:17:25 von Hilarion

> I stumbled across this query that I just can't optimize - even if I create
> all possible indexes it doesn't use them. Could someone please clue me in?
> I have no idea why MySQL doesn't use the indexes. :(
>
> The query in question is:
>
> mysql> explain select count(a0.count) from srchIds ids, srchWords a0 where
> a0.word LIKE '%abc%' and a0.id=ids.id group by ids.category;

> [...]

I'd rephrase the query like this (changed it to use INNER JOIN and also
included "ids.category" in the result):

SELECT
ids.category,
COUNT( a0.count ) AS srchWords_count
FROM
srchIds AS ids INNER JOIN
srchWords AS a0 ON a0.id = ids.id
WHERE
a0.word LIKE '%abc%'
GROUP BY
ids.category


Why do you use "COUNT( a0.count )"? If there's no NULL values in "a0.count"
column, then it's results are same as "COUNT( * )", which may be faster.
If there are NULL values, then you could add "a0.count IS NOT NULL" to
WHERE clause and use "COUNT( * )" - it may still be a bit faster.
Or maybe you wanted to use "SUM( a0.count )" or "COUNT( DISTINCT a0.count )"?


Only indexes that I see useful for this query are on "a0.id", "ids.id"
and "ids.category" (see below).


> What is interesting is that it doesn't consider the key on field 'category'

It's probably because it still has to view all the table (or rather table
join product). It does not need to view it category by category - it can go
record by record and add the viewed values of "a0.count" to the result for
the corresponding category (which may allready be in the result).


Hilarion

Re: mysql query optimization

am 15.12.2005 18:58:34 von anzenews

> Why do you use "COUNT( a0.count )"? If there's no NULL values in
> "a0.count" column, then it's results are same as "COUNT( * )", which may
> be faster. If there are NULL values, then you could add "a0.count IS NOT
> NULL" to WHERE clause and use "COUNT( * )" - it may still be a bit faster.
> Or maybe you wanted to use "SUM( a0.count )" or "COUNT( DISTINCT a0.count
> )"?

Sorry, a typo... :)
I simplified the original query so it would be easier to optimize. This is
the original query:

select a0.count
'count',ids.category,ids.textId,ids.created,ids.descr,ids.de scr_title from
srchIds ids, srchWords a0 where 1 and a0.word LIKE '%abc%' and a0.id=ids.id
group by ids.category, ids.textId order by ids.created desc, 'count' desc,
ids.textId desc

Would LEFT JOIN speed up the queries? I have a feeling that it shouldn't
matter...

>> What is interesting is that it doesn't consider the key on field
>> 'category'
>
> It's probably because it still has to view all the table (or rather table
> join product). It does not need to view it category by category - it can
> go record by record and add the viewed values of "a0.count" to the result
> for the corresponding category (which may allready be in the result).

Actually, it should only take the records from srchIds that have a
corresponding entry in srchWords, where word is the one we are searching
for... So it should use category index.

I think I have it all figured out. I was a combination of "LIKE
'%abc%" (which can't be indexed) and... a small table. MySQL desided it was
more efficient not to use indexes because the table was so small. When I
ran the query on a bigger table it used the indexes. Weird. :(

Thank you for your help!

Anze

Re: mysql query optimization

am 15.12.2005 19:02:30 von anzenews

> The problem is your like clause, because you use leading % mysql isn't
> able to use an index for it. To speed this up you could try to use the
> fulltext functions and the fulltext index of mysql but that isn't sure to
> help you.

Right to the point. :)

Hmmm, that might help - FULLTEXT search on a search table... ;)
I'll try this just to see if it is any faster than "LIKE '%abc%'".

Thank you!

Anze

Re: mysql query optimization

am 15.12.2005 21:17:00 von Hilarion

>> Why do you use "COUNT( a0.count )"? If there's no NULL values in
>> "a0.count" column, then it's results are same as "COUNT( * )", which may
>> be faster. If there are NULL values, then you could add "a0.count IS NOT
>> NULL" to WHERE clause and use "COUNT( * )" - it may still be a bit faster.
>> Or maybe you wanted to use "SUM( a0.count )" or "COUNT( DISTINCT a0.count
>> )"?
>
> Sorry, a typo... :)
> I simplified the original query so it would be easier to optimize. This is
> the original query:
>
> select a0.count
> 'count',ids.category,ids.textId,ids.created,ids.descr,ids.de scr_title from
> srchIds ids, srchWords a0 where 1 and a0.word LIKE '%abc%' and a0.id=ids.id
> group by ids.category, ids.textId order by ids.created desc, 'count' desc,
> ids.textId desc

You should use "`" instead of "'" for "count" column. "'" is for string
values, and "`" is for identifiers. Also you do not really need to specify
"count" as alias for "a0.count" because it's allready named "count" so
you are not changing anything here.

You also do not perform any aggregation, so GROUP BY here does nothing
(beside maybe slowing down the query execution). So it should be something
like this:

SELECT
a0.count,
ids.category,
ids.textId,
ids.created,
ids.descr,
ids.descr_title
FROM
srchIds AS ids INNER JOIN
srchWords AS a0 ON a0.id = ids.id
WHERE
a0.word LIKE '%abc%'
ORDER BY
ids.created DESC,
a0.count DESC,
ids.textId DESC


> Would LEFT JOIN speed up the queries? I have a feeling that it shouldn't
> matter...

Try INNER JOIN (looking at your query left outer join is probably not what
you need). In general you should use JOIN syntax instead of table list
and WHERE clause (for join purposes, not for filtering).


>>> What is interesting is that it doesn't consider the key on field
>>> 'category'
>>
>> It's probably because it still has to view all the table (or rather table
>> join product). It does not need to view it category by category - it can
>> go record by record and add the viewed values of "a0.count" to the result
>> for the corresponding category (which may allready be in the result).
>
> Actually, it should only take the records from srchIds that have a
> corresponding entry in srchWords, where word is the one we are searching
> for...

It does take only the records from srchIds that have corresponding
entry in srchWords (your query and mine with INNER JOIN).


> So it should use category index.

But category index is not related to this - it's not involved in joining
the tables nor in filtering the results. Only thing it could be used
for is grouping (but it will not help in any way even if you use some
aggregation) or sorting (but you do not sort by category).


> I think I have it all figured out. I was a combination of "LIKE
> '%abc%" (which can't be indexed)

This only influences use of index on a0.word. Indexes on text fields
can be used only for comparison ("=", "<", "<=", ">", ">=", "BETWEEN",
"IS NULL", "IS NOT NULL") and sorting. They do not help if you check
text parts (which includes "LIKE" and matching substrings etc.).


> and... a small table. MySQL desided it was
> more efficient not to use indexes because the table was so small. When I
> ran the query on a bigger table it used the indexes. Weird. :(

Which indexes?


> Thank you for your help!

No problem.



Hilarion

Re: mysql query optimization

am 16.12.2005 22:52:39 von anzenews

Hi!

> You should use "`" instead of "'" for "count" column. "'" is for string
> values, and "`" is for identifiers. Also you do not really need to specify
> "count" as alias for "a0.count" because it's allready named "count" so
> you are not changing anything here.

Is that cross-platform? I always thought that backticks ("`") are MySQL
specific, but I might be wrong. But I know that ticks ("'") work on Oracle,
MS SQL and MySQL, and probably also on postgreSQL and SQLite.

> You also do not perform any aggregation, so GROUP BY here does nothing
> (beside maybe slowing down the query execution).

You are right - thank you! :)

> So it should be something like this:

Thanks, I'll try and benchmark both of the solutions!
But I need to go to sleep first. :)

>> Would LEFT JOIN speed up the queries? I have a feeling that it shouldn't
>> matter...
>
> Try INNER JOIN (looking at your query left outer join is probably not what
> you need). In general you should use JOIN syntax instead of table list
> and WHERE clause (for join purposes, not for filtering).

What you learn is what you use... I never learned JOINs and never bothered
to use them properly. It might be time to do so. :)

Just for info - are they cross-platform? I don't want to be tied to any
specific platform too closely, even if it is open source.


>> Actually, it should only take the records from srchIds that have a
>> corresponding entry in srchWords, where word is the one we are searching
>> for...
>
> It does take only the records from srchIds that have corresponding
> entry in srchWords (your query and mine with INNER JOIN).
>
>
>> So it should use category index.
>
> But category index is not related to this - it's not involved in joining
> the tables nor in filtering the results. Only thing it could be used
> for is grouping (but it will not help in any way even if you use some
> aggregation) or sorting (but you do not sort by category).

I meant "word" index, but I found out later that it doesn't work on "LIKE
'%abc%'"... :)

>> and... a small table. MySQL desided it was
>> more efficient not to use indexes because the table was so small. When I
>> ran the query on a bigger table it used the indexes. Weird. :(
>
> Which indexes?

I must learn to write these answers when I'm thinking clearly... %-)

I tried many different combinations of the query, one of them used
" = 'abc'" instead of "LIKE '%abc%'" - and this one used an index on word
field. But only on larger table - on small table mySQL optimizer decided
that it is better not to use it. Which probably makes sense, but made it
more difficult for me to figure out what was going on.

So the next step (if I want to make subword search useable) is to:
- benchmark INNER JOIN / my syntax
- get rid of GROUP BY and benchmark the change
- try FULLTEXT search - it might be useful for faster searching of the
records that have subwords in them, though I wouldn't hope too much

Thank you again for your help!

Anze

Re: mysql query optimization

am 19.12.2005 18:14:33 von Hilarion

>> You should use "`" instead of "'" for "count" column. "'" is for string
>> values, and "`" is for identifiers. Also you do not really need to specify
>> "count" as alias for "a0.count" because it's allready named "count" so
>> you are not changing anything here.
>
> Is that cross-platform? I always thought that backticks ("`") are MySQL
> specific, but I might be wrong. But I know that ticks ("'") work on Oracle,
> MS SQL and MySQL, and probably also on postgreSQL and SQLite.

Not quite. Ticks (apostrophes) are for string values (it's in SQL standards).
They are NOT for quoting identifiers (even if they do work in some engines).
Oracle uses double quotes to quote identifiers, MS SQL uses square brackets
and MySQL uses backticks. In general avoid situations when you have to use
any way of quoting identifiers, which means NOT using reserved words as
identifiers, not using spaces in identifiers etc.
In this case you do not have to use any quoting, because you used
table prefix "a0.", but I'd rename the "count" column to something like
"cnt".


>> You also do not perform any aggregation, so GROUP BY here does nothing
>> (beside maybe slowing down the query execution).
>
> You are right - thank you! :)
>
>> So it should be something like this:
>
> Thanks, I'll try and benchmark both of the solutions!
> But I need to go to sleep first. :)
>
>>> Would LEFT JOIN speed up the queries? I have a feeling that it shouldn't
>>> matter...
>>
>> Try INNER JOIN (looking at your query left outer join is probably not what
>> you need). In general you should use JOIN syntax instead of table list
>> and WHERE clause (for join purposes, not for filtering).
>
> What you learn is what you use... I never learned JOINs and never bothered
> to use them properly. It might be time to do so. :)

JOINs are - in my personal opinion - one of the most important things
in SQL.


> Just for info - are they cross-platform? I don't want to be tied to any
> specific platform too closely, even if it is open source.

They are in SQL standard, so they are very cross-platform (they work
in MySQL, MS SQL and in Oracle 9 or 10). There are exceptions. The most
significant exception is Oracle, which does not support JOIN syntax in 8i
and previous versions (version 9 does support JOINs). It provides
a substitute (for some LEFT OUTER JOIN and RIGHT OUTER JOIN situations),
but it's not as powerful as standard JOIN.
Yes, you can use WHERE clause in all SQL engines to have same result
as INNER JOIN, but depending on DBMS engine, you can make your query
perform better when using JOIN syntax. In most cases there's no way
to use WHERE to emulate OUTER JOIN behavior.


>>> and... a small table. MySQL desided it was
>>> more efficient not to use indexes because the table was so small. When I
>>> ran the query on a bigger table it used the indexes. Weird. :(
>>
>> Which indexes?
>
> I must learn to write these answers when I'm thinking clearly... %-)
>
> I tried many different combinations of the query, one of them used
> " = 'abc'" instead of "LIKE '%abc%'" - and this one used an index on word
> field. But only on larger table - on small table mySQL optimizer decided
> that it is better not to use it. Which probably makes sense, but made it
> more difficult for me to figure out what was going on.

Ah, I see now. Some DBMS engines provide specific SQL syntax which
forces the engine to use (or not use) some specific index. I know
Oracle does. I do not know if MySQL also has something like this.
The syntax may be platform-dependant (in case of Oracle it's quite
safe, because it uses comments, so it'll not break the query for
other engines, which will just ignore the comment).


> So the next step (if I want to make subword search useable) is to:
> - benchmark INNER JOIN / my syntax
> - get rid of GROUP BY and benchmark the change

Yes, and yes.


> - try FULLTEXT search - it might be useful for faster searching of the
> records that have subwords in them, though I wouldn't hope too much

If it's about subwords, then fulltext indexes can make a big difference.


> Thank you again for your help!

No problem.



Hilarion

Re: mysql query optimization

am 11.01.2006 12:02:07 von Jim Michaels

I would also get rid of the 1 AND out of the WHERE clause. the SQL compiler
would prefer a boolean expression like 1=1 rather than a numeric result.
But really you can just hand-optimize this out since it really does nothing
at all according to the truth tables for AND.
Essentially, if you hard-wire one side of an AND to TRUE, then whatever
comes in to AND on the other side just comes out as a result: a
do-nothing-special - a plain wire from input to output.

"Anze" wrote in message
news:79iof.20712$h6.742430@news.siol.net...
>> Why do you use "COUNT( a0.count )"? If there's no NULL values in
>> "a0.count" column, then it's results are same as "COUNT( * )", which may
>> be faster. If there are NULL values, then you could add "a0.count IS NOT
>> NULL" to WHERE clause and use "COUNT( * )" - it may still be a bit
>> faster.
>> Or maybe you wanted to use "SUM( a0.count )" or "COUNT( DISTINCT a0.count
>> )"?
>
> Sorry, a typo... :)
> I simplified the original query so it would be easier to optimize. This is
> the original query:
>
> select a0.count
> 'count',ids.category,ids.textId,ids.created,ids.descr,ids.de scr_title from
> srchIds ids, srchWords a0 where 1 and a0.word LIKE '%abc%' and
> a0.id=ids.id
> group by ids.category, ids.textId order by ids.created desc, 'count' desc,
> ids.textId desc
>
> Would LEFT JOIN speed up the queries? I have a feeling that it shouldn't
> matter...
>
>>> What is interesting is that it doesn't consider the key on field
>>> 'category'
>>
>> It's probably because it still has to view all the table (or rather table
>> join product). It does not need to view it category by category - it can
>> go record by record and add the viewed values of "a0.count" to the result
>> for the corresponding category (which may allready be in the result).
>
> Actually, it should only take the records from srchIds that have a
> corresponding entry in srchWords, where word is the one we are searching
> for... So it should use category index.
>
> I think I have it all figured out. I was a combination of "LIKE
> '%abc%" (which can't be indexed) and... a small table. MySQL desided it
> was
> more efficient not to use indexes because the table was so small. When I
> ran the query on a bigger table it used the indexes. Weird. :(
>
> Thank you for your help!
>
> Anze