Slow query using string functions

am 26.05.2010 23:13:51 von Jerry Schwartz

I have a pretty simple query that seems to take a lot longer than it ought to
(over 2 minutes).

Table `feed_new` has a single VARCHAR(255) column, `new_title`, that is an
index.

Table `prod` has many fields: `prod_title` and `pub_id` are both indexes
(VARCHAR).

`feed_new` has 895 records, `prod` has 110432.

SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: NULL
key: new_title
key_len: 768
ref: NULL
rows: 1
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 9816
Extra: Using where

The query is doing a scan of the 9816 records that have pub_id = @PUBID, but
even so this seems like a long time. Are the built-in string functions really
that slow?

I suspect it would be faster if I built separate tables that had just the
shortened versions of the titles, but I wouldn't think that would be
necessary.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Slow query using string functions

am 27.05.2010 01:38:59 von Gavin Towey

Jerry,

Are you sure this is really your explain plan for this query? That's not a=
t all what I would expect to see.

Regards,
Gavin Towey

-----Original Message-----
From: Jerry Schwartz [mailto:jerry@gii.co.jp]
Sent: Wednesday, May 26, 2010 2:14 PM
To: mysql@lists.mysql.com
Subject: Slow query using string functions

I have a pretty simple query that seems to take a lot longer than it ought =
to
(over 2 minutes).

Table `feed_new` has a single VARCHAR(255) column, `new_title`, that is an
index.

Table `prod` has many fields: `prod_title` and `pub_id` are both indexes
(VARCHAR).

`feed_new` has 895 records, `prod` has 110432.

SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =3D
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id =3D @PUBID AND prod.prod_discont =3D 0
ORDER BY feed_new.new_title;

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: NULL
key: new_title
key_len: 768
ref: NULL
rows: 1
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 9816
Extra: Using where

The query is doing a scan of the 9816 records that have pub_id =3D @PUBID, =
but
even so this seems like a long time. Are the built-in string functions real=
ly
that slow?

I suspect it would be faster if I built separate tables that had just the
shortened versions of the titles, but I wouldn't think that would be
necessary.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgtowey@ffn.com

This message contains confidential information and is intended only for the=
individual named. If you are not the named addressee, you are notified th=
at reviewing, disseminating, disclosing, copying or distributing this e-mai=
l is strictly prohibited. Please notify the sender immediately by e-mail i=
f you have received this e-mail by mistake and delete this e-mail from your=
system. E-mail transmission cannot be guaranteed to be secure or error-fre=
e as information could be intercepted, corrupted, lost, destroyed, arrive l=
ate or incomplete, or contain viruses. The sender therefore does not accept=
liability for any loss or damage caused by viruses or errors or omissions =
in the contents of this message, which arise as a result of e-mail transmis=
sion. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089,=
USA, FriendFinder.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Slow query using string functions

am 27.05.2010 10:30:53 von Joerg Bruehe

Hi!

Jerry Schwartz wrote:
> I have a pretty simple query that seems to take a lot longer than i=
t ought to=20
> (over 2 minutes).
>=20
> [[...]]
>=20
> SELECT
> =09feed_new.new_title AS `New Title FROM Feed`,
> =09prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> =09prod.prod_title AS `Title FROM DB`,
> =09prod.prod_num AS `Prod Num`,
> =09prod.prod_published AS `Published FROM DB`
> FROM feed_new JOIN prod
> ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =3D=
=20
> LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
> WHERE prod.pub_id =3D @PUBID AND prod.prod_discont =3D 0
> ORDER BY feed_new.new_title;
>=20
> [[...]]
>=20
> The query is doing a scan of the 9816 records that have pub_id =
=3D @PUBID, but=20
> even so this seems like a long time. Are the built-in string functi=
ons really=20
> that slow?

The general rule is:
If you are not taking the value of a column directly but are applying
any function to it (like "LEFT" in your statement), an index cannot b=
e
used. So the table needs to be accessed (scanned) and the function
computed on each row, to evaluate the predicate (the comparison).

I am no expert in checking "explain" output, so I may well be wrong i=
n
my guess:
I think the execution will scan the whole "feed_new" table (895 recor=
ds)
for each of those 9816 matches of pub_id, so it is doing 8785320 call=
s
of "LEFT()" followed by a string comparison.

>=20
> I suspect it would be faster if I built separate tables that had ju=
st the=20
> shortened versions of the titles, but I wouldn't think that would b=
e=20
> necessary.

So IMO you have two choices:
- Either you accept the performance implications of a table scan
(which will of course get worse when your data grows),
- or you introduce another column in your table in which you store th=
e
function result (maintained on INSERT and UPDATE) and create an ind=
ex
on this column.

An additional table with the shortened columns is no good idea IMO,
because you would need to maintain it in sync with your "real" data.
In your example, it should be sufficient to add the new column to tab=
le
"feed", because your execution strategy should start by evaluating
prod.pub_id =3D @PUBID

HTH,
Jörg

--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstrasse 18a, D-12099 Berlin
Geschaeftsfuehrer: Juergen Kunz
Amtsgericht Muenchen: HRB161028

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Slow query using string functions

am 27.05.2010 15:08:45 von Baron Schwartz

Jerry,

On Wed, May 26, 2010 at 5:13 PM, Jerry Schwartz wrote:
> I have a pretty simple query that seems to take a lot longer than it ought to
> (over 2 minutes).
>

I suspect that if you watch Handler_ stats, you'll find that the
EXPLAIN estimate is wrong for some reason and it's accessing many more
rows than you think in the second table, or something similar. In any
case, I'd start by measuring what the query is actually doing, not
what EXPLAIN thinks. What does that show?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Slow query using string functions

am 27.05.2010 17:02:00 von Jerry Schwartz

>-----Original Message-----
>From: Gavin Towey [mailto:gtowey@ffn.com]
>Sent: Wednesday, May 26, 2010 7:39 PM
>To: jerry@gii.co.jp; mysql@lists.mysql.com
>Subject: RE: Slow query using string functions
>
>Jerry,
>
>Are you sure this is really your explain plan for this query? That's not at
>all what I would expect to see.
>
>Regards,
>Gavin Towey
>
[JS] I'm as sure as I can be. Here's a lot more information:

==========

SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');

DROP TEMPORARY TABLE IF EXISTS feed_new;

CREATE TEMPORARY TABLE feed_new (
new_title VARCHAR(255), INDEX (new_title)
);

INSERT INTO feed_new
VALUES

('Automotive Aftermarket in France - Channel Analysis to 2014'),
('Automotive Aftermarket in Germany - Channel Analysis to 2014'),
('Automotive Aftermarket in Italy - Channel Analysis to 2014'),
.... about 900 more rows

SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;

==========

>explain
-> SELECT
-> feed_new.new_title AS `New Title FROM Feed`,
-> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
-> prod.prod_title AS `Title FROM DB`,
-> prod.prod_num AS `Prod Num`,
-> prod.prod_published AS `Published FROM DB`
-> FROM feed_new JOIN prod
-> ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
-> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
-> ORDER BY feed_new.new_title
-> \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: NULL
key: new_title
key_len: 768
ref: NULL
rows: 1
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 9817
Extra: Using where
2 rows in set (0.00 sec)

==========

I assume that pub_id is the best key to use to limit the rows pulled from
prod, since it cuts it down to 9817 rows from 110000. I'm guessing that even
though prod_title is a key (MUL), using it in the expression

LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)

keeps it from being used as a key.

If I replace the expressions in the ON clause with simple matches, I get a
very different result:

>explain
-> SELECT
-> feed_new.new_title AS `New Title FROM Feed`,
-> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
-> prod.prod_title AS `Title FROM DB`,
-> prod.prod_num AS `Prod Num`,
-> prod.prod_published AS `Published FROM DB`
-> FROM feed_new JOIN prod
-> ON feed_new.new_title = prod.prod_title
-> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
-> ORDER BY feed_new.new_title
-> \G
*************************** 1. row **************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: new_title
key: new_title
key_len: 768
ref: NULL
rows: 882
Extra: Using index
*************************** 2. row **************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
key: prod_title
key_len: 768
ref: giiexpr_db.feed_new.new_title
rows: 1
Extra: Using where
2 rows in set (0.00 sec)
=====

Of course, in this case the number of qualifying rows is vastly different; but
I would think the optimization has to happen before the results of the SELECT
are known. No optimizer is that prescient. :-)

I'm not really surprised by the different optimizations, I'm surprised by the
difference in execution time. Using the function-based expressions in the ON
clause takes 2:03.38 minutes to return 267 rows (the first time, the second
time is a little faster); using a simple comparison with no functions takes
..03 seconds to return 1 row.

I benchmarked the expression I'm using

select benchmark(10000000,left("asbcdkdfsaklfjdakl;fjasdl;fjasdl;fj asd",
length("asbcdkdfsaklfjdakl;fjasdl;fjasdl;fjasd") - 5));

and the result was 2.87 seconds. That's ten million evaluations.

So where is the time going?

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Slow query using string functions

am 27.05.2010 21:09:59 von Jerry Schwartz

I haven't had a chance to try that, and this might be totally useless
information, but to create a clone of the bigger table (CREATE LIKE and INSERT
INTO) takes about 12 seconds.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com

>-----Original Message-----
>From: baron.schwartz@gmail.com [mailto:baron.schwartz@gmail.com] On Behalf Of
>Baron Schwartz
>Sent: Thursday, May 27, 2010 9:09 AM
>To: MySql
>Subject: Re: Slow query using string functions
>
>Jerry,
>
>On Wed, May 26, 2010 at 5:13 PM, Jerry Schwartz wrote:
>> I have a pretty simple query that seems to take a lot longer than it ought
>> to
>> (over 2 minutes).
>>
>
>I suspect that if you watch Handler_ stats, you'll find that the
>EXPLAIN estimate is wrong for some reason and it's accessing many more
>rows than you think in the second table, or something similar. In any
>case, I'd start by measuring what the query is actually doing, not
>what EXPLAIN thinks. What does that show?
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=jerry@gii.co.jp

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org