Slow query using string operator

Slow query using string operator

am 10.08.2010 23:39:00 von Jerry Schwartz

I'm running a set of queries that look like this:

===============
SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');

DROP TEMPORARY TABLE IF EXISTS feed_new;

CREATE TEMPORARY TABLE feed_new (
new_title VARCHAR(255), INDEX (new_title)
);

INSERT INTO feed_new
VALUES

('UK Investment Bonds 2010'),
('UK Protection 2010'),
('UK Personal Insurance Distribution 2010'),
('UK Private Medical Insurance 2010'),
....
('UK Private Motor Insurance 2010'),
('Wealth Management for Non-Resident Indians 2010'),
('Middle Eastern Cards Database 2010')
;

SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;

============

With a relatively small number of rows in `feed_new`, this can take many
seconds. With 163 rows in `feed_new`, compared against 11234 eligible rows in
prod, it took about 28 seconds. Here's what an EXPLAIN looks like:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: NULL
key: PRIMARY
key_len: 767
ref: NULL
rows: 1
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 11040
Extra: Using where

=========

prod.pub_id is an indexed VARCHAR(15).

If I remove the string functions, I don't get what I want -- but the remaining
query runs in .05 seconds. Here's an EXPLAIN of that one:

===============
us-gii >EXPLAIN
-> SELECT
-> feed_new.new_title AS `New Title FROM Feed`,
-> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
-> prod.prod_title AS `Title FROM DB`,
-> prod.prod_num AS `Prod Num`,
-> prod.prod_published AS `Published FROM DB`
-> FROM feed_new JOIN prod
-> ON feed_new.new_title = prod.prod_title
-> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
-> ORDER BY feed_new.new_title\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 767
ref: NULL
rows: 163
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
key: prod_title
key_len: 768
ref: giiexpr_db.feed_new.new_title
rows: 1
Extra: Using where
================

Obviously the string manipulation is keeping MySQL from using `prod_title` as
a key, but I wouldn't have thought that using `pub_id` instead would be that
horrific.

Does anyone have any suggestions as to how to speed this business up? I can't
get away without some string manipulation, because I'm looking for "near
matches" by ignoring the year at the end of the title.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Slow query using string operator

am 11.08.2010 00:52:50 von Travis Ard

Can you create a second, indexed column in your feed_new temp table that
includes the title without the year appended? That might allow you to get
by with a single pass through the larger prod table and avoid reading rows
from the feed_new table.

-Travis

-----Original Message-----
From: Jerry Schwartz [mailto:jerry@gii.co.jp]
Sent: Tuesday, August 10, 2010 3:39 PM
To: mysql@lists.mysql.com
Subject: Slow query using string operator

I'm running a set of queries that look like this:

===============
SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');

DROP TEMPORARY TABLE IF EXISTS feed_new;

CREATE TEMPORARY TABLE feed_new (
new_title VARCHAR(255), INDEX (new_title)
);

INSERT INTO feed_new
VALUES

('UK Investment Bonds 2010'),
('UK Protection 2010'),
('UK Personal Insurance Distribution 2010'),
('UK Private Medical Insurance 2010'),
....
('UK Private Motor Insurance 2010'),
('Wealth Management for Non-Resident Indians 2010'),
('Middle Eastern Cards Database 2010')
;

SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;

============

With a relatively small number of rows in `feed_new`, this can take many
seconds. With 163 rows in `feed_new`, compared against 11234 eligible rows
in
prod, it took about 28 seconds. Here's what an EXPLAIN looks like:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: NULL
key: PRIMARY
key_len: 767
ref: NULL
rows: 1
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 11040
Extra: Using where

=========

prod.pub_id is an indexed VARCHAR(15).

If I remove the string functions, I don't get what I want -- but the
remaining
query runs in .05 seconds. Here's an EXPLAIN of that one:

===============
us-gii >EXPLAIN
-> SELECT
-> feed_new.new_title AS `New Title FROM Feed`,
-> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
-> prod.prod_title AS `Title FROM DB`,
-> prod.prod_num AS `Prod Num`,
-> prod.prod_published AS `Published FROM DB`
-> FROM feed_new JOIN prod
-> ON feed_new.new_title = prod.prod_title
-> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
-> ORDER BY feed_new.new_title\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 767
ref: NULL
rows: 163
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
key: prod_title
key_len: 768
ref: giiexpr_db.feed_new.new_title
rows: 1
Extra: Using where
================

Obviously the string manipulation is keeping MySQL from using `prod_title`
as
a key, but I wouldn't have thought that using `pub_id` instead would be that

horrific.

Does anyone have any suggestions as to how to speed this business up? I
can't
get away without some string manipulation, because I'm looking for "near
matches" by ignoring the year at the end of the title.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=travis_ard@hotmail.com



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Slow query using string operator

am 11.08.2010 14:38:00 von Joerg Bruehe

Hi Jerry, all!


I second Travis' advice:

Travis Ard schrieb:
> Can you create a second, indexed column in your feed_new temp table tha=
t
> includes the title without the year appended? That might allow you to =
get
> by with a single pass through the larger prod table and avoid reading r=
ows
> from the feed_new table.=20

The original query does a string operation on the values from both sides
before checking the result for equality:

> -----Original Message-----
> From: Jerry Schwartz [mailto:jerry@gii.co.jp]=20
> Sent: Tuesday, August 10, 2010 3:39 PM
> To: mysql@lists.mysql.com
> Subject: Slow query using string operator
>=20
> I'm running a set of queries that look like this:
>=20
> [[...]]
>=20
> SELECT
> feed_new.new_title AS `New Title FROM Feed`,
> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> prod.prod_title AS `Title FROM DB`,
> prod.prod_num AS `Prod Num`,
> prod.prod_published AS `Published FROM DB`
> FROM feed_new JOIN prod
> ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
> LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
> WHERE prod.pub_id =3D @PUBID AND prod.prod_discont =3D 0
> ORDER BY feed_new.new_title;

So neither value is taken directly, which means the values in the
indexes (if defined) cannot be used anyway.

If you need these calculations, you should compute and maintain these
values when inserting/updating data (define triggers doing this, or run
periodic maintenance/check jobs), and store them in suitable indexes.

AFAIK, this applies to all comparisons which use function results rather
than column values directly.


HTH,
Jörg

--=20
Joerg Bruehe, MySQL Build Team, joerg.bruehe@oracle.com
ORACLE Deutschland B.V. & Co. KG, Komturstrasse 18a, D-12099 Berlin
Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven
Amtsgericht Muenchen: HRA 95603


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: Slow query using string operator

am 11.08.2010 15:07:30 von Jerry Schwartz

>-----Original Message-----
>From: Travis Ard [mailto:travis_ard@hotmail.com]
>Sent: Tuesday, August 10, 2010 6:53 PM
>To: 'Jerry Schwartz'; mysql@lists.mysql.com
>Subject: RE: Slow query using string operator
>
>Can you create a second, indexed column in your feed_new temp table that
>includes the title without the year appended? That might allow you to get
>by with a single pass through the larger prod table and avoid reading rows
>from the feed_new table.
>
[JS] I have thought about that, but haven't tried it yet.

I'll let you know.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com



>-Travis
>
>-----Original Message-----
>From: Jerry Schwartz [mailto:jerry@gii.co.jp]
>Sent: Tuesday, August 10, 2010 3:39 PM
>To: mysql@lists.mysql.com
>Subject: Slow query using string operator
>
>I'm running a set of queries that look like this:
>
>===============
>SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');
>
>DROP TEMPORARY TABLE IF EXISTS feed_new;
>
>CREATE TEMPORARY TABLE feed_new (
> new_title VARCHAR(255), INDEX (new_title)
> );
>
>INSERT INTO feed_new
>VALUES
>
>('UK Investment Bonds 2010'),
>('UK Protection 2010'),
>('UK Personal Insurance Distribution 2010'),
>('UK Private Medical Insurance 2010'),
>...
>('UK Private Motor Insurance 2010'),
>('Wealth Management for Non-Resident Indians 2010'),
>('Middle Eastern Cards Database 2010')
>;
>
>SELECT
> feed_new.new_title AS `New Title FROM Feed`,
> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> prod.prod_title AS `Title FROM DB`,
> prod.prod_num AS `Prod Num`,
> prod.prod_published AS `Published FROM DB`
>FROM feed_new JOIN prod
>ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
>LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
>WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
>ORDER BY feed_new.new_title;
>
>============
>
>With a relatively small number of rows in `feed_new`, this can take many
>seconds. With 163 rows in `feed_new`, compared against 11234 eligible rows
>in
>prod, it took about 28 seconds. Here's what an EXPLAIN looks like:
>
>*************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: feed_new
> type: index
>possible_keys: NULL
> key: PRIMARY
> key_len: 767
> ref: NULL
> rows: 1
> Extra: Using index
>*************************** 2. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
>possible_keys: pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref: const
> rows: 11040
> Extra: Using where
>
>=========
>
>prod.pub_id is an indexed VARCHAR(15).
>
>If I remove the string functions, I don't get what I want -- but the
>remaining
>query runs in .05 seconds. Here's an EXPLAIN of that one:
>
>===============
>us-gii >EXPLAIN
> -> SELECT
> -> feed_new.new_title AS `New Title FROM Feed`,
> -> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> -> prod.prod_title AS `Title FROM DB`,
> -> prod.prod_num AS `Prod Num`,
> -> prod.prod_published AS `Published FROM DB`
> -> FROM feed_new JOIN prod
> -> ON feed_new.new_title = prod.prod_title
> -> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
> -> ORDER BY feed_new.new_title\G
>*************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: feed_new
> type: index
>possible_keys: PRIMARY
> key: PRIMARY
> key_len: 767
> ref: NULL
> rows: 163
> Extra: Using index
>*************************** 2. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
>possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
> key: prod_title
> key_len: 768
> ref: giiexpr_db.feed_new.new_title
> rows: 1
> Extra: Using where
>================
>
>Obviously the string manipulation is keeping MySQL from using `prod_title`
>as
>a key, but I wouldn't have thought that using `pub_id` instead would be that
>
>horrific.
>
>Does anyone have any suggestions as to how to speed this business up? I
>can't
>get away without some string manipulation, because I'm looking for "near
>matches" by ignoring the year at the end of the title.
>
>Regards,
>
>Jerry Schwartz
>Global Information Incorporated
>195 Farmington Ave.
>Farmington, CT 06032
>
>860.674.8796 / FAX: 860.674.8341
>E-mail: jerry@gii.co.jp
>Web site: www.the-infoshop.com
>
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=travis_ard@hotmail.com
>





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Slow query using string operator

am 11.08.2010 15:38:41 von Jerry Schwartz

Yowsers! I expected that eliminating half of the string manipulation would
help, but I never imagined that the difference would be so great.

The SELECT now runs in well under a second.
=========================

SELECT pub_id FROM pub WHERE pub_code = 'DC' INTO @PUBID;

DROP TEMPORARY TABLE IF EXISTS feed_new;

CREATE TEMPORARY TABLE feed_new (
new_title VARCHAR(255), PRIMARY KEY (new_title),
new_title_truncated VARCHAR(255), INDEX (new_title_truncated)
);

# The next line loads up dummy data, but it "looks like"
# what I'm really using.

INSERT IGNORE INTO feed_new
SELECT prod.prod_title, LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
FROM prod
WHERE prod.prod_discont = 0
AND prod.pub_id = @PUBID
AND RAND() < .01;

SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON feed_new.new_title_truncated = LEFT(prod.prod_title,
LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;
117 rows in set (0.25 sec)

EXPLAIN
SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON feed_new.new_title_truncated = LEFT(prod.prod_title,
LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title\G

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 11041
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: ref
possible_keys: new_title_truncated
key: new_title_truncated
key_len: 768
ref: func
rows: 1
Extra: Using where; Using index
======================

It look like the optimizer flipped the JOIN around so that it could use the
key in feed_new.

Thanks for your help, all.


Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com


>-----Original Message-----
>From: Travis Ard [mailto:travis_ard@hotmail.com]
>Sent: Tuesday, August 10, 2010 6:53 PM
>To: 'Jerry Schwartz'; mysql@lists.mysql.com
>Subject: RE: Slow query using string operator
>
>Can you create a second, indexed column in your feed_new temp table that
>includes the title without the year appended? That might allow you to get
>by with a single pass through the larger prod table and avoid reading rows
>from the feed_new table.
>
>-Travis
>
>-----Original Message-----
>From: Jerry Schwartz [mailto:jerry@gii.co.jp]
>Sent: Tuesday, August 10, 2010 3:39 PM
>To: mysql@lists.mysql.com
>Subject: Slow query using string operator
>
>I'm running a set of queries that look like this:
>
>===============
>SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');
>
>DROP TEMPORARY TABLE IF EXISTS feed_new;
>
>CREATE TEMPORARY TABLE feed_new (
> new_title VARCHAR(255), INDEX (new_title)
> );
>
>INSERT INTO feed_new
>VALUES
>
>('UK Investment Bonds 2010'),
>('UK Protection 2010'),
>('UK Personal Insurance Distribution 2010'),
>('UK Private Medical Insurance 2010'),
>...
>('UK Private Motor Insurance 2010'),
>('Wealth Management for Non-Resident Indians 2010'),
>('Middle Eastern Cards Database 2010')
>;
>
>SELECT
> feed_new.new_title AS `New Title FROM Feed`,
> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> prod.prod_title AS `Title FROM DB`,
> prod.prod_num AS `Prod Num`,
> prod.prod_published AS `Published FROM DB`
>FROM feed_new JOIN prod
>ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
>LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
>WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
>ORDER BY feed_new.new_title;
>
>============
>
>With a relatively small number of rows in `feed_new`, this can take many
>seconds. With 163 rows in `feed_new`, compared against 11234 eligible rows
>in
>prod, it took about 28 seconds. Here's what an EXPLAIN looks like:
>
>*************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: feed_new
> type: index
>possible_keys: NULL
> key: PRIMARY
> key_len: 767
> ref: NULL
> rows: 1
> Extra: Using index
>*************************** 2. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
>possible_keys: pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref: const
> rows: 11040
> Extra: Using where
>
>=========
>
>prod.pub_id is an indexed VARCHAR(15).
>
>If I remove the string functions, I don't get what I want -- but the
>remaining
>query runs in .05 seconds. Here's an EXPLAIN of that one:
>
>===============
>us-gii >EXPLAIN
> -> SELECT
> -> feed_new.new_title AS `New Title FROM Feed`,
> -> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> -> prod.prod_title AS `Title FROM DB`,
> -> prod.prod_num AS `Prod Num`,
> -> prod.prod_published AS `Published FROM DB`
> -> FROM feed_new JOIN prod
> -> ON feed_new.new_title = prod.prod_title
> -> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
> -> ORDER BY feed_new.new_title\G
>*************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: feed_new
> type: index
>possible_keys: PRIMARY
> key: PRIMARY
> key_len: 767
> ref: NULL
> rows: 163
> Extra: Using index
>*************************** 2. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
>possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
> key: prod_title
> key_len: 768
> ref: giiexpr_db.feed_new.new_title
> rows: 1
> Extra: Using where
>================
>
>Obviously the string manipulation is keeping MySQL from using `prod_title`
>as
>a key, but I wouldn't have thought that using `pub_id` instead would be that
>
>horrific.
>
>Does anyone have any suggestions as to how to speed this business up? I
>can't
>get away without some string manipulation, because I'm looking for "near
>matches" by ignoring the year at the end of the title.
>
>Regards,
>
>Jerry Schwartz
>Global Information Incorporated
>195 Farmington Ave.
>Farmington, CT 06032
>
>860.674.8796 / FAX: 860.674.8341
>E-mail: jerry@gii.co.jp
>Web site: www.the-infoshop.com
>
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=travis_ard@hotmail.com
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=jerry@gii.co.jp





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org