Slow when using sub-query

Slow when using sub-query

am 02.06.2010 22:05:46 von Jerry Schwartz

I've heard that sub-queries aren't well-optimized, but this case seems
ridiculous.

First, a little setup:

SELECT pub_id FROM pub WHERE pub_code = 'GD' INTO @P;

=== Inner Query by Itself ===

us-gii >SELECT prod_pub_prod_id FROM prod
-> WHERE pub_id = @P
-> AND prod_discont = 0
-> GROUP BY prod_pub_prod_id
-> HAVING COUNT(*) > 1;
+------------------+
| prod_pub_prod_id |
+------------------+
| NULL |
| GDAE0106ICR |
| GDME0002TR |
| GDME0023IAR |
| GDME0059IAR |
+------------------+
5 rows in set (0.05 sec)

us-gii >EXPLAIN
-> SELECT prod_pub_prod_id FROM prod
-> WHERE pub_id = @P
-> AND prod_discont = 0
-> GROUP BY prod_pub_prod_id
-> HAVING COUNT(*) > 1
-> \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 1543
Extra: Using where; Using temporary; Using filesort

=== Outer Query without Inner Query ===

us-gii >SELECT prod_num FROM prod
-> WHERE pub_id = @P
-> AND prod_pub_prod_id IN
-> (
-> NULL,
-> 'GDAE0106ICR',
-> 'GDME0002TR',
-> 'GDME0023IAR',
-> 'GDME0059IAR'
-> )
-> ;
+----------+
| prod_num |
+----------+
| 83298 |
| 85092 |
| 88728 |
| 97231 |
| 97235 |
| 98368 |
| 107693 |
| 112461 |
+----------+
8 rows in set (0.01 sec)

us-gii >EXPLAIN
-> SELECT prod_num FROM prod
-> WHERE pub_id = @P
-> AND prod_pub_prod_id IN
-> (
-> NULL,
-> 'GDAE0106ICR',
-> 'GDME0002TR',
-> 'GDME0023IAR',
-> 'GDME0059IAR'
-> )
-> \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: prod_pub_prod_id,pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 1543
Extra: Using where

=== Outer Query with Sub-query ===

us-gii >EXPLAIN
-> SELECT prod_num FROM prod
-> WHERE pub_id = @P
-> AND prod_pub_prod_id IN
-> (SELECT prod_pub_prod_id FROM prod
-> WHERE pub_id = @P
-> AND prod_discont = 0
-> GROUP BY prod_pub_prod_id
-> HAVING COUNT(*) > 1)
-> \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 1543
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: prod
type: index
possible_keys: pub_id,pub_id_2
key: prod_pub_prod_id
key_len: 768
ref: NULL
rows: 72
Extra: Using where; Using filesort

I don't know how long the Outer Query with Sub-query would take: I killed it
after several minutes. I'm guessing that it has to do with the fact that the
inner query is "dependent", but why is that happening?

=== Rewritten as Join ===

us-gii >SELECT prod_num FROM prod JOIN
-> (SELECT prod_pub_prod_id FROM prod
-> WHERE pub_id = @P
-> AND prod_discont = 0
-> GROUP BY prod_pub_prod_id
-> HAVING COUNT(*) > 1) AS x
-> ON prod.prod_pub_prod_id = x.prod_pub_prod_id
-> WHERE prod.pub_id = @P
-> AND prod.prod_discont = 0;
+----------+
| prod_num |
+----------+
| 98368 |
| 107693 |
| 83298 |
| 85092 |
| 88728 |
| 97231 |
| 97235 |
| 112461 |
+----------+
8 rows in set (0.05 sec)

us-gii >EXPLAIN
-> SELECT prod_num FROM prod JOIN
-> (SELECT prod_pub_prod_id FROM prod
-> WHERE pub_id = @P
-> AND prod_discont = 0
-> GROUP BY prod_pub_prod_id
-> HAVING COUNT(*) > 1) AS x
-> ON prod.prod_pub_prod_id = x.prod_pub_prod_id
-> WHERE prod.pub_id = @P
-> AND prod.prod_discont = 0
-> \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table:
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: prod
type: ref
possible_keys: prod_pub_prod_id,pub_id,pub_id_2
key: prod_pub_prod_id
key_len: 768
ref: x.prod_pub_prod_id
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref:
rows: 1544
Extra: Using where; Using temporary; Using filesort
3 rows in set (0.03 sec)

=====

What a difference! I don't understand it, though.

Does anyone want to take on the challenge of educating me?

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com






--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Slow when using sub-query

am 03.06.2010 12:51:44 von Johan De Meersman

--005045013dbf0da9ee04881dfe24
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

The short answer is that the optimizer is amazingly stupid about subqueries=
,
and it assumes that they are dependent even when they're not - that subquer=
y
gets executed for every row in your main query.

The fastest way to do this, would probably be to run your subquery, have
your code assemble the appropriate IN clause, and then run your main query.

The long answer is that there's a rather good Advanced Tuning course that
addresses all this and more, as does Baron & C° 's excellent MySQL book.



On Wed, Jun 2, 2010 at 10:05 PM, Jerry Schwartz wrote:

> I've heard that sub-queries aren't well-optimized, but this case seems
> ridiculous.
>
> First, a little setup:
>
> SELECT pub_id FROM pub WHERE pub_code =3D 'GD' INTO @P;
>
> ===3D Inner Query by Itself ===3D
>
> us-gii >SELECT prod_pub_prod_id FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_discont =3D 0
> -> GROUP BY prod_pub_prod_id
> -> HAVING COUNT(*) > 1;
> +------------------+
> | prod_pub_prod_id |
> +------------------+
> | NULL |
> | GDAE0106ICR |
> | GDME0002TR |
> | GDME0023IAR |
> | GDME0059IAR |
> +------------------+
> 5 rows in set (0.05 sec)
>
> us-gii >EXPLAIN
> -> SELECT prod_pub_prod_id FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_discont =3D 0
> -> GROUP BY prod_pub_prod_id
> -> HAVING COUNT(*) > 1
> -> \G
> *************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
> possible_keys: pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref: const
> rows: 1543
> Extra: Using where; Using temporary; Using filesort
>
> ===3D Outer Query without Inner Query ===3D
>
> us-gii >SELECT prod_num FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_pub_prod_id IN
> -> (
> -> NULL,
> -> 'GDAE0106ICR',
> -> 'GDME0002TR',
> -> 'GDME0023IAR',
> -> 'GDME0059IAR'
> -> )
> -> ;
> +----------+
> | prod_num |
> +----------+
> | 83298 |
> | 85092 |
> | 88728 |
> | 97231 |
> | 97235 |
> | 98368 |
> | 107693 |
> | 112461 |
> +----------+
> 8 rows in set (0.01 sec)
>
> us-gii >EXPLAIN
> -> SELECT prod_num FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_pub_prod_id IN
> -> (
> -> NULL,
> -> 'GDAE0106ICR',
> -> 'GDME0002TR',
> -> 'GDME0023IAR',
> -> 'GDME0059IAR'
> -> )
> -> \G
> *************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
> possible_keys: prod_pub_prod_id,pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref: const
> rows: 1543
> Extra: Using where
>
> ===3D Outer Query with Sub-query ===3D
>
> us-gii >EXPLAIN
> -> SELECT prod_num FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_pub_prod_id IN
> -> (SELECT prod_pub_prod_id FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_discont =3D 0
> -> GROUP BY prod_pub_prod_id
> -> HAVING COUNT(*) > 1)
> -> \G
> *************************** 1. row ***************************
> id: 1
> select_type: PRIMARY
> table: prod
> type: ref
> possible_keys: pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref: const
> rows: 1543
> Extra: Using where
> *************************** 2. row ***************************
> id: 2
> select_type: DEPENDENT SUBQUERY
> table: prod
> type: index
> possible_keys: pub_id,pub_id_2
> key: prod_pub_prod_id
> key_len: 768
> ref: NULL
> rows: 72
> Extra: Using where; Using filesort
>
> I don't know how long the Outer Query with Sub-query would take: I killed
> it
> after several minutes. I'm guessing that it has to do with the fact that
> the
> inner query is "dependent", but why is that happening?
>
> ===3D Rewritten as Join ===3D
>
> us-gii >SELECT prod_num FROM prod JOIN
> -> (SELECT prod_pub_prod_id FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_discont =3D 0
> -> GROUP BY prod_pub_prod_id
> -> HAVING COUNT(*) > 1) AS x
> -> ON prod.prod_pub_prod_id =3D x.prod_pub_prod_id
> -> WHERE prod.pub_id =3D @P
> -> AND prod.prod_discont =3D 0;
> +----------+
> | prod_num |
> +----------+
> | 98368 |
> | 107693 |
> | 83298 |
> | 85092 |
> | 88728 |
> | 97231 |
> | 97235 |
> | 112461 |
> +----------+
> 8 rows in set (0.05 sec)
>
> us-gii >EXPLAIN
> -> SELECT prod_num FROM prod JOIN
> -> (SELECT prod_pub_prod_id FROM prod
> -> WHERE pub_id =3D @P
> -> AND prod_discont =3D 0
> -> GROUP BY prod_pub_prod_id
> -> HAVING COUNT(*) > 1) AS x
> -> ON prod.prod_pub_prod_id =3D x.prod_pub_prod_id
> -> WHERE prod.pub_id =3D @P
> -> AND prod.prod_discont =3D 0
> -> \G
> *************************** 1. row ***************************
> id: 1
> select_type: PRIMARY
> table:
> type: ALL
> possible_keys: NULL
> key: NULL
> key_len: NULL
> ref: NULL
> rows: 5
> Extra:
> *************************** 2. row ***************************
> id: 1
> select_type: PRIMARY
> table: prod
> type: ref
> possible_keys: prod_pub_prod_id,pub_id,pub_id_2
> key: prod_pub_prod_id
> key_len: 768
> ref: x.prod_pub_prod_id
> rows: 1
> Extra: Using where
> *************************** 3. row ***************************
> id: 2
> select_type: DERIVED
> table: prod
> type: ref
> possible_keys: pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref:
> rows: 1544
> Extra: Using where; Using temporary; Using filesort
> 3 rows in set (0.03 sec)
>
> =====3D
>
> What a difference! I don't understand it, though.
>
> Does anyone want to take on the challenge of educating me?
>
> Regards,
>
> Jerry Schwartz
> Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
>
> 860.674.8796 / FAX: 860.674.8341
>
> www.the-infoshop.com
>
>
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dvegivamp@tuxera.b=
e
>
>


--=20
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--005045013dbf0da9ee04881dfe24--

RE: Slow when using sub-query

am 03.06.2010 15:41:09 von Jerry Schwartz

>-----Original Message-----
>From: vegivamp@gmail.com [mailto:vegivamp@gmail.com] On Behalf Of Johan De
>Meersman
>Sent: Thursday, June 03, 2010 6:52 AM
>To: jerry@gii.co.jp
>Cc: mysql@lists.mysql.com
>Subject: Re: Slow when using sub-query
>
>The short answer is that the optimizer is amazingly stupid about subqueries,
>and it assumes that they are dependent even when they're not - that subquery
>gets executed for every row in your main query.
>
[JS] That's what I figured was happening -- either that, or my CPU was on
strike for higher voltages.

>The fastest way to do this, would probably be to run your subquery, have
>your code assemble the appropriate IN clause, and then run your main query.
>
[JS] A lot of what I do is one-off things, and I usually wind up using the
CLI. Rewriting the whole business as a JOIN to a derived table worked very
well.

>The long answer is that there's a rather good Advanced Tuning course that
>addresses all this and more, as does Baron & C° 's excellent MySQL book.
>
[JS] No doubt. Our database is so small, and my duties are so diverse, that
tuning only gets my attention when a problem interferes with other people's
productivity.

Thanks.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org