OR vs UNION
am 13.10.2009 12:48:04 von Majk.Skoric
Hi List,
i have a problem with an OR STATEMENT. Maybe someone can explain to me
why the
mysql optimizer doesn't work like expected. Please have a look at
following "similar" queries.
mysql> explain SELECT * FROM KTEMP WHERE (buchungs_kunde_id =3D
'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1) or =
(kunde_id=3D
'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1);
+----+-------------+-------+-------------+------------------ ----------+-
---------------------------+---------+------+------+-------- ------------
----------------------------------+
| id | select_type | table | type | possible_keys |
key | key_len | ref | rows | Extra
|
+----+-------------+-------+-------------+------------------ ----------+-
---------------------------+---------+------+------+-------- ------------
----------------------------------+
| 1 | SIMPLE | KTEMP | index_merge | buchungs_kunde_id,kunde_id |
buchungs_kunde_id,kunde_id | 71,71 | NULL | 2 | Using
union(buchungs_kunde_id,kunde_id); Using where |
+----+-------------+-------+-------------+------------------ ----------+-
---------------------------+---------+------+------+-------- ------------
----------------------------------+
1 row in set (0.00 sec)
All seems fine here . Optimizer choose to use an union! This is the same
as following union query.
mysql> explain extended (SELECT * FROM KTEMP WHERE buchungs_kunde_id =3D
'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1) UNION =
(SELECT
* FROM KTEMP WHERE kunde_id =3D 'dfb49c8c0b441e9f' and veranst_id =3D =
16058
and status =3D 1);
+----+--------------+------------+------+------------------- +-----------
--------+---------+-------------------+------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+--------------+------------+------+------------------- +-----------
--------+---------+-------------------+------+-------------+
| 1 | PRIMARY | KTEMP | ref | buchungs_kunde_id |
buchungs_kunde_id | 71 | const,const,const | 1 | Using where |
| 2 | UNION | KTEMP | ref | kunde_id | kunde_id
| 71 | const,const,const | 1 | Using where |
|NULL | UNION RESULT | | ALL | NULL | NULL
| NULL | NULL | NULL | |
But the following query is handled in a strange way
mysql> explain SELECT * FROM KTEMP WHERE (buchungs_kunde_id =3D
'dfb49c8c0b441e9f' or kunde_id=3D 'dfb49c8c0b441e9f') and veranst_id =3D
16058 and status =3D 1;
+----+-------------+-------+------+------------------------- ---+------+-
--------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------- ---+------+-
--------+------+---------+-------------+
| 1 | SIMPLE | KTEMP | ALL | buchungs_kunde_id,kunde_id | NULL |
NULL | NULL | 1040700 | Using where |
+----+-------------+-------+------+------------------------- ---+------+-
--------+------+---------+-------------+
I don't get it! Maybe someone has a clue or a hint for me.
TABLEDEF.
| KTEMP | CREATE TABLE `KTEMP` (
`tid` bigint(20) NOT NULL auto_increment,
`kunde_id` varchar(20) collate utf8_bin NOT NULL,
`buchung_id` varchar(20) collate utf8_bin NOT NULL default '0',
`buchungs_kunde_id` varchar(20) collate utf8_bin NOT NULL,
`veranst_id` bigint(20) NOT NULL,
`rolle_nummer` int(11) default '0',
`status` tinyint(1) unsigned NOT NULL,
`tstamp_insert` bigint(20) NOT NULL,
`tstamp_update` bigint(20) NOT NULL,
`KategorienWechsel` tinyint(4) NOT NULL default '0',
PRIMARY KEY (`tid`),
KEY `buchungs_kunde_id` (`buchungs_kunde_id`,`veranst_id`,`status`),
KEY `kunde_id` (`kunde_id`,`veranst_id`,`status`)
) ENGINE=3DInnoDB DEFAULT CHARSET=3Dutf8 COLLATE=3Dutf8_bin |
mysql> SELECT VERSION();
+---------------------+
| VERSION() |
+---------------------+
| 5.0.27-standard-log |
+---------------------+
1 row in set (0.00 sec)
Regards,
Majk
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg
Re: OR vs UNION
am 13.10.2009 15:26:20 von Joerg Bruehe
Majk, all,
I'm no optimizer expert, but your result doesn't really surprise me.
I'll reorder your post because that makes reasoning simpler:
Majk.Skoric@eventim.de wrote:
> Hi List,
First, your table:
> TABLEDEF.
> | KTEMP | CREATE TABLE `KTEMP` (
> `tid` bigint(20) NOT NULL auto_increment,
> `kunde_id` varchar(20) collate utf8_bin NOT NULL,
> `buchung_id` varchar(20) collate utf8_bin NOT NULL default '0',
> `buchungs_kunde_id` varchar(20) collate utf8_bin NOT NULL,
> `veranst_id` bigint(20) NOT NULL,
> `rolle_nummer` int(11) default '0',
> `status` tinyint(1) unsigned NOT NULL,
> `tstamp_insert` bigint(20) NOT NULL,
> `tstamp_update` bigint(20) NOT NULL,
> `KategorienWechsel` tinyint(4) NOT NULL default '0',
> PRIMARY KEY (`tid`),
> KEY `buchungs_kunde_id` (`buchungs_kunde_id`,`veranst_id`,`status=
`),
> KEY `kunde_id` (`kunde_id`,`veranst_id`,`status`)
> ) ENGINE=3DInnoDB DEFAULT CHARSET=3Dutf8 COLLATE=3Dutf8_bin |
So you have two indexes which consist of three fields each, and the
least significant two fields are the same for both indexes.
You do a SELECT that fully specifies values for these two indexes,
combining them with "OR":
>=20
> mysql> explain SELECT * FROM KTEMP WHERE (buchungs_kunde_id =3D
> 'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1) or (k=
unde_id=3D
> 'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1);
> +----+-------------+-------+-------------+------------------ -------=
---+-
> ---------------------------+---------+------+------+-------- -------=
-----
> ----------------------------------+
> | id | select_type | table | type | possible_keys =
|
> key | key_len | ref | rows | Extra
> |
> +----+-------------+-------+-------------+------------------ -------=
---+-
> ---------------------------+---------+------+------+-------- -------=
-----
> ----------------------------------+
> | 1 | SIMPLE | KTEMP | index_merge | buchungs_kunde_id,kunde_=
id |
> buchungs_kunde_id,kunde_id | 71,71 | NULL | 2 | Using
> union(buchungs_kunde_id,kunde_id); Using where |
> +----+-------------+-------+-------------+------------------ -------=
---+-
> ---------------------------+---------+------+------+-------- -------=
-----
> ----------------------------------+
> 1 row in set (0.00 sec)
>=20
> All seems fine here . Optimizer choose to use an union! This is the=
same
> as following union query.
As an alternative, you replace the "OR" by a UNION. No real change:
>=20
> mysql> explain extended (SELECT * FROM KTEMP WHERE buchungs_kunde_i=
d =3D
> 'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1) UNION=
(SELECT
> * FROM KTEMP WHERE kunde_id =3D 'dfb49c8c0b441e9f' and veranst_id =
=3D 16058
> and status =3D 1);
> +----+--------------+------------+------+------------------- +------=
-----
> --------+---------+-------------------+------+-------------+
> | id | select_type | table | type | possible_keys | key
> | key_len | ref | rows | Extra |
> +----+--------------+------------+------+------------------- +------=
-----
> --------+---------+-------------------+------+-------------+
> | 1 | PRIMARY | KTEMP | ref | buchungs_kunde_id |
> buchungs_kunde_id | 71 | const,const,const | 1 | Using wher=
e |
> | 2 | UNION | KTEMP | ref | kunde_id | kunde=
_id
> | 71 | const,const,const | 1 | Using where |
> |NULL | UNION RESULT | | ALL | NULL | NULL
> | NULL | NULL | NULL | |
>=20
Note that both queries fully specify the index values.
Then, you apply Boolean logic to factor out the two identical predica=
tes
=66rom both the "OR" alternatives:
>=20
> But the following query is handled in a strange way
>=20
> mysql> explain SELECT * FROM KTEMP WHERE (buchungs_kunde_id =3D
> 'dfb49c8c0b441e9f' or kunde_id=3D 'dfb49c8c0b441e9f') and veranst_i=
d =3D
> 16058 and status =3D 1;
> +----+-------------+-------+------+------------------------- ---+---=
---+-
> --------+------+---------+-------------+
> | id | select_type | table | type | possible_keys | ke=
y |
> key_len | ref | rows | Extra |
> +----+-------------+-------+------+------------------------- ---+---=
---+-
> --------+------+---------+-------------+
> | 1 | SIMPLE | KTEMP | ALL | buchungs_kunde_id,kunde_id | NU=
LL |
> NULL | NULL | 1040700 | Using where |
> +----+-------------+-------+------+------------------------- ---+---=
---+-
> --------+------+---------+-------------+
>=20
> I don't get it! Maybe someone has a clue or a hint for me.
While that is equivalent from a logic point of view, it is different =
for
the optimizer:
You do not specify the values for any of the indexes completely, beca=
use
the first term contains an "OR" over two different fields, and the ot=
her
terms don't specify the most significant fields of the index.
The only way for the system to use an index would be to revert your
change and to go back to the first statement, where the "OR" is on th=
e
outermost level.
>=20
>=20
> mysql> SELECT VERSION();
> +---------------------+
> | VERSION() |
> +---------------------+
> | 5.0.27-standard-log |
> +---------------------+
> 1 row in set (0.00 sec)
I don't think using a newer version would change anything, but you
should be working on an update nonetheless. There were several import=
ant
changes since that version, including security fixes.
But as 5.0 is approaching its EOL and 5.1 is GA, I hope that update w=
ill
target 5.1.
HTH,
Jörg
--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB1610=
28
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg
AW: OR vs UNION
am 13.10.2009 15:43:01 von Majk.Skoric
> -----Ursprüngliche Nachricht-----
> Von: Joerg.Bruehe@Sun.COM [mailto:Joerg.Bruehe@Sun.COM]
> Gesendet: Dienstag, 13. Oktober 2009 15:26
> An: Skoric, Majk
> Cc: mysql@lists.mysql.com
> Betreff: Re: OR vs UNION
>=20
> Majk, all,
>=20
>=20
> I'm no optimizer expert, but your result doesn't really surprise me.
>=20
> I'll reorder your post because that makes reasoning simpler:
>=20
> Majk.Skoric@eventim.de wrote:
> > Hi List,
>=20
> First, your table:
>=20
> > TABLEDEF.
> > | KTEMP | CREATE TABLE `KTEMP` (
> > `tid` bigint(20) NOT NULL auto_increment,
> > `kunde_id` varchar(20) collate utf8_bin NOT NULL,
> > `buchung_id` varchar(20) collate utf8_bin NOT NULL default '0',
> > `buchungs_kunde_id` varchar(20) collate utf8_bin NOT NULL,
> > `veranst_id` bigint(20) NOT NULL,
> > `rolle_nummer` int(11) default '0',
> > `status` tinyint(1) unsigned NOT NULL,
> > `tstamp_insert` bigint(20) NOT NULL,
> > `tstamp_update` bigint(20) NOT NULL,
> > `KategorienWechsel` tinyint(4) NOT NULL default '0',
> > PRIMARY KEY (`tid`),
> > KEY `buchungs_kunde_id`
> (`buchungs_kunde_id`,`veranst_id`,`status`),
> > KEY `kunde_id` (`kunde_id`,`veranst_id`,`status`)
> > ) ENGINE=3DInnoDB DEFAULT CHARSET=3Dutf8 COLLATE=3Dutf8_bin |
>=20
> So you have two indexes which consist of three fields each, and the
> least significant two fields are the same for both indexes.
>=20
>=20
> You do a SELECT that fully specifies values for these two indexes,
> combining them with "OR":
>=20
> >
> > mysql> explain SELECT * FROM KTEMP WHERE (buchungs_kunde_id =3D
> > 'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1) or
> (kunde_id=3D
> > 'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1);
> > =
+----+-------------+-------+-------------+------------------ ---------
> -+-
> > =
---------------------------+---------+------+------+-------- ---------
> ---
> > ----------------------------------+
> > | id | select_type | table | type | possible_keys
> |
> > key | key_len | ref | rows | Extra
> > |
> > =
+----+-------------+-------+-------------+------------------ ---------
> -+-
> > =
---------------------------+---------+------+------+-------- ---------
> ---
> > ----------------------------------+
> > | 1 | SIMPLE | KTEMP | index_merge | =
buchungs_kunde_id,kunde_id
> |
> > buchungs_kunde_id,kunde_id | 71,71 | NULL | 2 | Using
> > union(buchungs_kunde_id,kunde_id); Using where |
> > =
+----+-------------+-------+-------------+------------------ ---------
> -+-
> > =
---------------------------+---------+------+------+-------- ---------
> ---
> > ----------------------------------+
> > 1 row in set (0.00 sec)
> >
> > All seems fine here . Optimizer choose to use an union! This is the
> same
> > as following union query.
>=20
> As an alternative, you replace the "OR" by a UNION. No real change:
>=20
> >
> > mysql> explain extended (SELECT * FROM KTEMP WHERE buchungs_kunde_id
> =3D
> > 'dfb49c8c0b441e9f' and veranst_id =3D 16058 and status =3D 1) UNION
> (SELECT
> > * FROM KTEMP WHERE kunde_id =3D 'dfb49c8c0b441e9f' and veranst_id =
=3D
> 16058
> > and status =3D 1);
> > =
+----+--------------+------------+------+------------------- +--------
> ---
> > --------+---------+-------------------+------+-------------+
> > | id | select_type | table | type | possible_keys | key
> > | key_len | ref | rows | Extra |
> > =
+----+--------------+------------+------+------------------- +--------
> ---
> > --------+---------+-------------------+------+-------------+
> > | 1 | PRIMARY | KTEMP | ref | buchungs_kunde_id |
> > buchungs_kunde_id | 71 | const,const,const | 1 | Using where
> |
> > | 2 | UNION | KTEMP | ref | kunde_id |
> kunde_id
> > | 71 | const,const,const | 1 | Using where |
> > |NULL | UNION RESULT | | ALL | NULL | NULL
> > | NULL | NULL | NULL | |
> >
>=20
> Note that both queries fully specify the index values.
>=20
>=20
> Then, you apply Boolean logic to factor out the two identical
> predicates
> from both the "OR" alternatives:
>=20
> >
> > But the following query is handled in a strange way
> >
> > mysql> explain SELECT * FROM KTEMP WHERE (buchungs_kunde_id =3D
> > 'dfb49c8c0b441e9f' or kunde_id=3D 'dfb49c8c0b441e9f') and veranst_id =
=3D
> > 16058 and status =3D 1;
> > =
+----+-------------+-------+------+------------------------- ---+-----
> -+-
> > --------+------+---------+-------------+
> > | id | select_type | table | type | possible_keys | key
> |
> > key_len | ref | rows | Extra |
> > =
+----+-------------+-------+------+------------------------- ---+-----
> -+-
> > --------+------+---------+-------------+
> > | 1 | SIMPLE | KTEMP | ALL | buchungs_kunde_id,kunde_id | =
NULL
> |
> > NULL | NULL | 1040700 | Using where |
> > =
+----+-------------+-------+------+------------------------- ---+-----
> -+-
> > --------+------+---------+-------------+
> >
> > I don't get it! Maybe someone has a clue or a hint for me.
>=20
> While that is equivalent from a logic point of view, it is different
> for
> the optimizer:
> You do not specify the values for any of the indexes completely,
> because
> the first term contains an "OR" over two different fields, and the
> other
> terms don't specify the most significant fields of the index.
>=20
> The only way for the system to use an index would be to revert your
> change and to go back to the first statement, where the "OR" is on the
> outermost level.
Ah ok. I got it!
>=20
> >
> >
> > mysql> SELECT VERSION();
> > +---------------------+
> > | VERSION() |
> > +---------------------+
> > | 5.0.27-standard-log |
> > +---------------------+
> > 1 row in set (0.00 sec)
>=20
> I don't think using a newer version would change anything, but you
> should be working on an update nonetheless. There were several
> important
> changes since that version, including security fixes.
Its on my TODO list ;)
>=20
> But as 5.0 is approaching its EOL and 5.1 is GA, I hope that update
> will
> target 5.1.
>=20
>=20
> HTH,
> Jörg
Thanks for your time!
Majk
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg