Join with OR-condition and Indexes
Join with OR-condition and Indexes
am 04.01.2010 12:53:41 von Tobias Schultze
------=_NextPart_000_0000_01CA8D3C.F3928360
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Hello,
I'm working on an application for my bachelor thesis.
I'm having a performance problem with a SQL-Query in MySQL5.
I hoped someone can easily enlighten me in this issue.
The schema:
CREATE TABLE IF NOT EXISTS `athletes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`last_name` varchar(20) NOT NULL,
`first_name` varchar(20) NOT NULL,
`gender` enum('male','female') NOT NULL,
`birthday` date NOT NULL,
`country` char(2) NOT NULL,
`club_id` int(11) NOT NULL,
`is_active` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `gender_index_idx` (`gender`),
KEY `is_active_index_idx` (`is_active`),
KEY `club_id_idx` (`club_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Sportler bzw Mitglieder des
Verbandes';
-- --------------------------------------------------------
CREATE TABLE IF NOT EXISTS `matches` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`teammatch_id` int(11) NOT NULL,
`match_type` varchar(10) NOT NULL,
`team1_player_id` int(11) DEFAULT NULL,
`team1_partner_id` int(11) DEFAULT NULL,
`team2_player_id` int(11) DEFAULT NULL,
`team2_partner_id` int(11) DEFAULT NULL,
`team1_score` tinyint(3) unsigned DEFAULT NULL,
`team2_score` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `teammatch_id_idx` (`teammatch_id`),
KEY `team1_player_id_idx` (`team1_player_id`),
KEY `team1_partner_id_idx` (`team1_partner_id`),
KEY `team2_player_id_idx` (`team2_player_id`),
KEY `team2_partner_id_idx` (`team2_partner_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Spiele zwischen zwei oder
vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=46665 ;
I want to get all matches for each athlete and calculate statistics such as
number of matches etc.
The basic very simplified query is like
SELECT a.id, COUNT(*)
FROM athletes a
LEFT JOIN matches m ON (
m.team1_player_id = a.id OR
m.team1_partner_id = a.id OR
m.team2_player_id = a.id OR
m.team2_partner_id = a.id
)
WHERE a.gender = 'female'
GROUP BY a.id
Now the problem is, that mysql uses a full table scan to retrieve the
matches for an athlete, so the execution takes many seconds or even worse.
An athlete can be referenced in any of the m.team1_player_id OR
m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That allows
doubles matches.)
Why is an full table scan necessary although there is an index on each of
these fields? So an index exists for each OR-part of the join condition...
Here the execution plan:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
a
ref
gender_index_idx
gender_index_idx
1
const
2193
Using where; Using temporary; Using filesort
1
SIMPLE
m
ALL
team1_player_id_idx,team1_partner_id_idx,team2_pla...
NULL
NULL
NULL
46664
Joining on each fields like the following is very fast and uses the index
but of course doesn't give me the expected result.
FROM athletes a
LEFT JOIN matches m ON (a.id = m.team1_player_id)
LEFT JOIN matches m2 ON (a.id = m2.team2_player_id)
Maybe I need to do a workaround using a UNION?
But this doesn't help either: (It takes 76 seconds)
FROM athletes a
LEFT JOIN (
(
SELECT team1_player_id AS player_id, teammatch_id, match_type, team1_score,
team2_score, team1_points, team2_points, no_fight
FROM matches
)
UNION
(
SELECT team1_partner_id, teammatch_id, match_type, team1_score, team2_score,
team1_points, team2_points, no_fight
FROM matches
)
UNION
(
SELECT team2_player_id, teammatch_id, match_type, team1_score, team2_score,
team1_points, team2_points, no_fight
FROM matches
)
UNION
(
SELECT team2_partner_id, teammatch_id, match_type, team1_score, team2_score,
team1_points, team2_points, no_fight
FROM matches
)
) m ON (a.id = m.player_id)
I hope someone can help me with this.
Thanks in advance.
Regards
Tobias
------=_NextPart_000_0000_01CA8D3C.F3928360--
Re: Join with OR-condition and Indexes
am 05.01.2010 15:18:10 von Shawn Green
Tobias Schultze wrote:
> Hello,
>
>
>
> I'm working on an application for my bachelor thesis.
>
> I'm having a performance problem with a SQL-Query in MySQL5.
>
> I hoped someone can easily enlighten me in this issue.
>
>
>
> The schema:
>
>
>
> CREATE TABLE IF NOT EXISTS `athletes` (
>
> `id` int(11) NOT NULL AUTO_INCREMENT,
>
> `last_name` varchar(20) NOT NULL,
>
> `first_name` varchar(20) NOT NULL,
>
> `gender` enum('male','female') NOT NULL,
>
> `birthday` date NOT NULL,
>
> `country` char(2) NOT NULL,
>
> `club_id` int(11) NOT NULL,
>
> `is_active` tinyint(1) NOT NULL,
>
> PRIMARY KEY (`id`),
>
> KEY `gender_index_idx` (`gender`),
>
> KEY `is_active_index_idx` (`is_active`),
>
> KEY `club_id_idx` (`club_id`)
>
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Sportler bzw Mitglieder des
> Verbandes';
>
>
>
> -- --------------------------------------------------------
>
>
>
> CREATE TABLE IF NOT EXISTS `matches` (
>
> `id` int(11) NOT NULL AUTO_INCREMENT,
>
> `teammatch_id` int(11) NOT NULL,
>
> `match_type` varchar(10) NOT NULL,
>
> `team1_player_id` int(11) DEFAULT NULL,
>
> `team1_partner_id` int(11) DEFAULT NULL,
>
> `team2_player_id` int(11) DEFAULT NULL,
>
> `team2_partner_id` int(11) DEFAULT NULL,
>
> `team1_score` tinyint(3) unsigned DEFAULT NULL,
>
> `team2_score` tinyint(3) unsigned DEFAULT NULL,
>
> PRIMARY KEY (`id`),
>
> KEY `teammatch_id_idx` (`teammatch_id`),
>
> KEY `team1_player_id_idx` (`team1_player_id`),
>
> KEY `team1_partner_id_idx` (`team1_partner_id`),
>
> KEY `team2_player_id_idx` (`team2_player_id`),
>
> KEY `team2_partner_id_idx` (`team2_partner_id`)
>
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Spiele zwischen zwei oder
> vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=46665 ;
>
>
>
>
>
> I want to get all matches for each athlete and calculate statistics such as
> number of matches etc.
>
> The basic very simplified query is like
>
>
>
> SELECT a.id, COUNT(*)
>
> FROM athletes a
>
> LEFT JOIN matches m ON (
>
> m.team1_player_id = a.id OR
>
> m.team1_partner_id = a.id OR
>
> m.team2_player_id = a.id OR
>
> m.team2_partner_id = a.id
>
> )
>
> WHERE a.gender = 'female'
>
> GROUP BY a.id
>
>
>
>
>
> Now the problem is, that mysql uses a full table scan to retrieve the
> matches for an athlete, so the execution takes many seconds or even worse.
>
> An athlete can be referenced in any of the m.team1_player_id OR
> m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That allows
> doubles matches.)
>
> Why is an full table scan necessary although there is an index on each of
> these fields? So an index exists for each OR-part of the join condition...
>
>
>
> Here the execution plan:
>
>
>
>
> id
>
> select_type
>
> table
>
> type
>
> possible_keys
>
> key
>
> key_len
>
> ref
>
> rows
>
> Extra
>
>
> 1
>
> SIMPLE
>
> a
>
> ref
>
> gender_index_idx
>
> gender_index_idx
>
> 1
>
> const
>
> 2193
>
> Using where; Using temporary; Using filesort
>
>
> 1
>
> SIMPLE
>
> m
>
> ALL
>
> team1_player_id_idx,team1_partner_id_idx,team2_pla...
>
> NULL
>
> NULL
>
> NULL
>
> 46664
>
>
>
>
>
>
>
> Joining on each fields like the following is very fast and uses the index
> but of course doesn't give me the expected result.
>
>
>
> FROM athletes a
>
> LEFT JOIN matches m ON (a.id = m.team1_player_id)
>
> LEFT JOIN matches m2 ON (a.id = m2.team2_player_id)
>
>
>
>
>
> Maybe I need to do a workaround using a UNION?
>
> But this doesn't help either: (It takes 76 seconds)
>
>
>
> FROM athletes a
>
> LEFT JOIN (
>
> (
>
> SELECT team1_player_id AS player_id, teammatch_id, match_type, team1_score,
> team2_score, team1_points, team2_points, no_fight
>
> FROM matches
>
> )
>
> UNION
>
> (
>
> SELECT team1_partner_id, teammatch_id, match_type, team1_score, team2_score,
> team1_points, team2_points, no_fight
>
> FROM matches
>
> )
>
> UNION
>
> (
>
> SELECT team2_player_id, teammatch_id, match_type, team1_score, team2_score,
> team1_points, team2_points, no_fight
>
> FROM matches
>
> )
>
> UNION
>
> (
>
> SELECT team2_partner_id, teammatch_id, match_type, team1_score, team2_score,
> team1_points, team2_points, no_fight
>
> FROM matches
>
> )
>
> ) m ON (a.id = m.player_id)
>
>
>
>
>
>
>
> I hope someone can help me with this.
>
> Thanks in advance.
>
>
I think the problem here is that you have an ordered vector table
(main1, partner1, main2, partner2) which requires you to optionally
match on 4 separate columns to make your JOIN condition because the
athlete you want to locate could be in any of those 4 places in the
match record.
What would be faster is to normalize your match data into two tables.
The first table would be just match information (id, date, location,
etc) and another table of match-athlete pairs (match_id, athlete_id)
alternatively you would have to use a query like
(
select...
FROM athletes
INNER JOIN matches
ON ... matches.member1
) UNION (
select...
FROM athletes
INNER JOIN matches
ON ... matches.partner1
) UNION (
select...
FROM athletes
INNER JOIN matches
ON ... matches.member2
) UNION (
select...
FROM athletes
INNER JOIN matches
ON ... matches.partner2
)
in order to use the indexes for the JOINS.
--
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: Join with OR-condition and Indexes
am 05.01.2010 17:14:24 von Michael Dykman
What it comes down to is that MySQL can only use 1 index per table per
query. The moment your query includes OR examining different columns,
a full table scan is the only option.
One typical way to implement this is to use UNIONS as Mr. Green suggested:
SELECT aid, count(*) FROM
(
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team1_player_id =3D a.id)
UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team1_partner_id =3D a.id)
UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team2_player_id =3D a.id)
UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team2_partner_id =3D a.id)
) AS tmp GROUP BY aid
On Tue, Jan 5, 2010 at 9:18 AM, Shawn Green wrote:
> Tobias Schultze wrote:
>>
>> Hello,
>>
>>
>> I'm working on an application for my bachelor thesis.
>>
>> I'm having a performance problem with a SQL-Query in MySQL5.
>>
>> I hoped someone can easily enlighten me in this issue.
>>
>>
>> The schema:
>>
>>
>> CREATE TABLE IF NOT EXISTS `athletes` (
>>
>> =A0`id` int(11) NOT NULL AUTO_INCREMENT,
>>
>> =A0`last_name` varchar(20) NOT NULL,
>>
>> =A0`first_name` varchar(20) NOT NULL,
>>
>> =A0`gender` enum('male','female') NOT NULL,
>>
>> =A0`birthday` date NOT NULL,
>>
>> =A0`country` char(2) NOT NULL,
>>
>> =A0`club_id` int(11) NOT NULL,
>>
>> =A0`is_active` tinyint(1) NOT NULL,
>>
>> =A0PRIMARY KEY (`id`),
>>
>> =A0KEY `gender_index_idx` (`gender`),
>>
>> =A0KEY `is_active_index_idx` (`is_active`),
>>
>> =A0KEY `club_id_idx` (`club_id`)
>>
>> ) ENGINE=3DMyISAM =A0DEFAULT CHARSET=3Dutf8 COMMENT=3D'Sportler bzw Mitg=
lieder des
>> Verbandes';
>>
>>
>> -- --------------------------------------------------------
>>
>>
>> CREATE TABLE IF NOT EXISTS `matches` (
>>
>> =A0`id` int(11) NOT NULL AUTO_INCREMENT,
>>
>> =A0`teammatch_id` int(11) NOT NULL,
>>
>> =A0`match_type` varchar(10) NOT NULL,
>>
>> =A0`team1_player_id` int(11) DEFAULT NULL,
>>
>> =A0`team1_partner_id` int(11) DEFAULT NULL,
>>
>> =A0`team2_player_id` int(11) DEFAULT NULL,
>>
>> =A0`team2_partner_id` int(11) DEFAULT NULL,
>>
>> =A0`team1_score` tinyint(3) unsigned DEFAULT NULL,
>>
>> =A0`team2_score` tinyint(3) unsigned DEFAULT NULL,
>>
>> =A0PRIMARY KEY (`id`),
>>
>> =A0KEY `teammatch_id_idx` (`teammatch_id`),
>>
>> =A0KEY `team1_player_id_idx` (`team1_player_id`),
>>
>> =A0KEY `team1_partner_id_idx` (`team1_partner_id`),
>>
>> =A0KEY `team2_player_id_idx` (`team2_player_id`),
>>
>> =A0KEY `team2_partner_id_idx` (`team2_partner_id`)
>>
>> ) ENGINE=3DMyISAM =A0DEFAULT CHARSET=3Dutf8 COMMENT=3D'Spiele zwischen z=
wei oder
>> vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=3D46665 ;
>>
>>
>>
>> I want to get all matches for each athlete and calculate statistics such
>> as
>> number of matches etc.
>>
>> The basic very simplified query is like
>>
>>
>> SELECT a.id, COUNT(*)
>>
>> FROM athletes a
>>
>> LEFT JOIN matches m ON (
>> =A0 =A0 =A0 =A0 =A0 =A0m.team1_player_id =3D a.id OR
>> =A0 =A0 =A0 =A0 =A0 =A0m.team1_partner_id =3D a.id OR
>> =A0 =A0 =A0 =A0 =A0 =A0m.team2_player_id =3D a.id OR
>> =A0 =A0 =A0 =A0 =A0 =A0m.team2_partner_id =3D a.id
>>
>> )
>>
>> WHERE a.gender =3D 'female'
>>
>> GROUP BY a.id
>>
>>
>>
>> Now the problem is, that mysql uses a full table scan to retrieve the
>> matches for an athlete, so the execution takes many seconds or even wors=
e.
>>
>> An athlete can be referenced in any of the m.team1_player_id OR
>> m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That
>> allows
>> doubles matches.)
>>
>> Why is an full table scan necessary although there is an index on each o=
f
>> these fields? So an index exists for each OR-part of the join condition.=
...
>>
>>
>> Here the execution plan:
>>
>>
>>
>> id
>> select_type
>> table
>> type
>> possible_keys
>> key
>> key_len
>> ref
>> rows
>> Extra
>>
>> 1
>>
>> SIMPLE
>>
>> a
>>
>> ref
>>
>> gender_index_idx
>>
>> gender_index_idx
>>
>> 1
>>
>> const
>>
>> 2193
>>
>> Using where; Using temporary; Using filesort
>>
>>
>> 1
>>
>> SIMPLE
>>
>> m
>>
>> ALL
>>
>> team1_player_id_idx,team1_partner_id_idx,team2_pla...
>>
>> NULL
>>
>> NULL
>>
>> NULL
>>
>> 46664
>>
>>
>>
>>
>> Joining on each fields like the following is very fast and uses the inde=
x
>> but of course doesn't give me the expected result.
>>
>>
>> FROM athletes a
>>
>> LEFT JOIN matches m ON (a.id =3D m.team1_player_id)
>>
>> LEFT JOIN matches m2 ON (a.id =3D m2.team2_player_id)
>>
>>
>>
>> Maybe I need to do a workaround using a UNION?
>>
>> But this doesn't help either: (It takes 76 seconds)
>>
>>
>> FROM athletes a
>>
>> LEFT JOIN (
>>
>> (
>>
>> SELECT team1_player_id AS player_id, teammatch_id, match_type,
>> team1_score,
>> team2_score, team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>> UNION
>> (
>>
>> SELECT team1_partner_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> UNION
>> (
>>
>> SELECT team2_player_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> UNION
>> (
>>
>> SELECT team2_partner_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> ) m ON (a.id =3D m.player_id)
>>
>>
>>
>>
>> I hope someone can help me with this.
>>
>> Thanks in advance.
>>
>>
>
> I think the problem here is that you have an ordered vector table (main1,
> partner1, main2, partner2) which requires you to optionally match on 4
> separate columns to make your JOIN condition because the athlete you want=
to
> locate could be in any of those 4 places in the match record.
>
> What would be faster is to normalize your match data into two tables. The
> first table would be just match information (id, date, location, etc) and
> another table of match-athlete pairs (match_id, athlete_id)
>
> alternatively you would have to use a query like
> (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.member1
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.partner1
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.member2
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.partner2
> )
>
> in order to use the indexes for the JOINS.
>
> --
> Shawn Green, MySQL Senior Support Engineer
> Sun Microsystems, Inc.
> Office: Blountville, TN
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: =A0 =A0http://lists.mysql.com/mysql?unsub=3Dmdykman@gmail=
..com
>
>
--=20
- michael dykman
- mdykman@gmail.com
May the Source be with you.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg
Re: Join with OR-condition and Indexes
am 06.01.2010 09:22:26 von Tobias Schultze
Thanks for your answers.
I found out, that MySQL behaves very strange in this situation.
I think this is a bug or important missing feature.
I would like to see how other DBMS behave in this situation, which I =
would
think is a common problem - whenever you want to join one column of a =
table
with several columns of another table.
MySQL uses an index_merge when I query for one specific athlete:
SELECT COUNT(*)
FROM matches m
WHERE (m.team1_player_id =3D 808884 OR m.team1_partner_id =3D 808884 OR
m.team2_player_id =3D 808884 OR m.team2_partner_id =3D 808884)
id select_type table type possible_keys key key_len
ref rows Extra
1 SIMPLE m index_merge
team1_player_id_idx,team1_partner_id_idx,team2_player_id_idx ,team2_partne=
r_i
d_idx
team1_player_id_idx,team1_partner_id_idx,team2_player_id_idx ,team2_partne=
r_i
d_idx 5,5,5,5 NULL 153 Using
union(team1_player_id_idx,team1_partner_id_idx,team2_player_ id_idx,team2_=
par
tner_id_idx); Using where
But it doesn't use the index merge when joining like I originally tried,
although logically it should be able to use it, shouldn't it?
Of course, adding USE INDEX FOR JOIN (team1_player_id_idx,
team1_partner_id_idx, team2_player_id_idx, team2_partner_id_idx) cannot
convince MySQL to use the indexes.
SELECT a.id, COUNT(*)
FROM athletes a
FROM athletes a
LEFT JOIN matches m USE INDEX FOR JOIN (team1_player_id_idx,
team1_partner_id_idx, team2_player_id_idx, team2_partner_id_idx) ON (
m.team1_player_id =3D a.id OR
m.team1_partner_id =3D a.id OR
m.team2_player_id =3D a.id OR
m.team2_partner_id =3D a.id
)
WHERE a.gender =3D 'female'
GROUP BY a.id
Then Michael Dykman said, MySQL is restricted to one index per table per
query. So I thought, maybe I can help MySQL when I add a compound index =
for
all players. So it could use the one index to resolve the join.
ALTER TABLE `matches` ADD INDEX `players_idx` ( `team1_player_id` ,
`team1_partner_id` , `team2_player_id` , `team2_partner_id` ) ;
Explain now gives me:
id select_type table type possible_keys key key_len
ref rows Extra
1 SIMPLE a ref gender_index_idx gender_index_idx
1 const 2193 Using where; Using temporary; Using filesort
1 SIMPLE m index
team1_player_id_idx,team1_partner_id_idx,team2_pla... players_idx 20
NULL 46664 Using index
Hm, what does this mean? It uses the index (players_idx) but still =
evaluates
all 46664 rows? This does not make sense to me.=20
Anyways, the query is still extremely slow (2400 rows in 3 min 5 sec). =
But
with IGNORE INDEX (players_idx) the query seems to be infinite (over 15
min).
So now it seems that UNIONS are the only options. But they are also much
slower than it should be. You version took 10 seconds.
My modified version with UNION ALL instead of DISTINCT takes 5.2 seconds =
-
only showing athletes with min. 1 match.
SELECT *, COUNT(am.id) AS number_matches
FROM (
SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.*
FROM athletes a
INNER JOIN matches m ON (m.team1_player_id =3D a.id)
UNION ALL
SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.*
FROM athletes a
INNER JOIN matches m ON (m.team1_partner_id =3D a.id)
UNION ALL
SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.*
FROM athletes a
INNER JOIN matches m ON (m.team2_player_id =3D a.id)
UNION ALL
SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.*
FROM athletes a
INNER JOIN matches m ON (m.team2_partner_id =3D a.id)
) AS am
GROUP BY am.athlete_id
ORDER BY number_matches DESC
But then I could also use the idea I had in the first post, which still
needs 3.8 seconds.
SELECT *, COUNT(am.id) AS number_matches
FROM (
(
SELECT team1_player_id AS player_id, teammatch_id, match_type,
team1_score, team2_score, team1_points, team2_points, no_fight=20
FROM matches
WHERE team1_player_id IS NOT NULL # reduced the query from 4.1 to =
3.8
sec
) UNION ALL (
SELECT team1_partner_id, teammatch_id, match_type, team1_score,
team2_score, team1_points, team2_points, no_fight=20
FROM matches
WHERE team1_partner_id IS NOT NULL
) UNION ALL (
SELECT team2_player_id, teammatch_id, match_type, team1_score,
team2_score, team1_points, team2_points, no_fight=20
FROM matches
WHERE team2_player_id IS NOT NULL
) UNION ALL (
SELECT team2_partner_id, teammatch_id, match_type, team1_score,
team2_score, team1_points, team2_points, no_fight=20
FROM matches
WHERE team2_partner_id IS NOT NULL
)
) AS am
INNER JOIN athletes a ON (am.player_id =3D a.id)
GROUP BY a.id
ORDER BY number_matches DESC
Changing the database schema doesn't seem to be usefull. Regarding my =
schema
is already normalized and over-normalization generally decreases
performances. I guess I would then run into other problems.
Greetings
Tobias
-----Ursprüngliche Nachricht-----
Von: Michael Dykman [mailto:mdykman@gmail.com]=20
Gesendet: Dienstag, 5. Januar 2010 17:14
An: mysql@lists.mysql.com
Cc: Tobias Schultze
Betreff: Re: Join with OR-condition and Indexes
What it comes down to is that MySQL can only use 1 index per table per
query. The moment your query includes OR examining different columns,
a full table scan is the only option.
One typical way to implement this is to use UNIONS as Mr. Green =
suggested:
SELECT aid, count(*) FROM
(
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team1_player_id =3D a.id)
UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team1_partner_id =3D a.id)
UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team2_player_id =3D a.id)
UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team2_partner_id =3D a.id)
) AS tmp GROUP BY aid
On Tue, Jan 5, 2010 at 9:18 AM, Shawn Green wrote:
> Tobias Schultze wrote:
>>
>> Hello,
>>
>>
>> I'm working on an application for my bachelor thesis.
>>
>> I'm having a performance problem with a SQL-Query in MySQL5.
>>
>> I hoped someone can easily enlighten me in this issue.
>>
>>
>> The schema:
>>
>>
>> CREATE TABLE IF NOT EXISTS `athletes` (
>>
>> =A0`id` int(11) NOT NULL AUTO_INCREMENT,
>>
>> =A0`last_name` varchar(20) NOT NULL,
>>
>> =A0`first_name` varchar(20) NOT NULL,
>>
>> =A0`gender` enum('male','female') NOT NULL,
>>
>> =A0`birthday` date NOT NULL,
>>
>> =A0`country` char(2) NOT NULL,
>>
>> =A0`club_id` int(11) NOT NULL,
>>
>> =A0`is_active` tinyint(1) NOT NULL,
>>
>> =A0PRIMARY KEY (`id`),
>>
>> =A0KEY `gender_index_idx` (`gender`),
>>
>> =A0KEY `is_active_index_idx` (`is_active`),
>>
>> =A0KEY `club_id_idx` (`club_id`)
>>
>> ) ENGINE=3DMyISAM =A0DEFAULT CHARSET=3Dutf8 COMMENT=3D'Sportler bzw =
Mitglieder
des
>> Verbandes';
>>
>>
>> -- --------------------------------------------------------
>>
>>
>> CREATE TABLE IF NOT EXISTS `matches` (
>>
>> =A0`id` int(11) NOT NULL AUTO_INCREMENT,
>>
>> =A0`teammatch_id` int(11) NOT NULL,
>>
>> =A0`match_type` varchar(10) NOT NULL,
>>
>> =A0`team1_player_id` int(11) DEFAULT NULL,
>>
>> =A0`team1_partner_id` int(11) DEFAULT NULL,
>>
>> =A0`team2_player_id` int(11) DEFAULT NULL,
>>
>> =A0`team2_partner_id` int(11) DEFAULT NULL,
>>
>> =A0`team1_score` tinyint(3) unsigned DEFAULT NULL,
>>
>> =A0`team2_score` tinyint(3) unsigned DEFAULT NULL,
>>
>> =A0PRIMARY KEY (`id`),
>>
>> =A0KEY `teammatch_id_idx` (`teammatch_id`),
>>
>> =A0KEY `team1_player_id_idx` (`team1_player_id`),
>>
>> =A0KEY `team1_partner_id_idx` (`team1_partner_id`),
>>
>> =A0KEY `team2_player_id_idx` (`team2_player_id`),
>>
>> =A0KEY `team2_partner_id_idx` (`team2_partner_id`)
>>
>> ) ENGINE=3DMyISAM =A0DEFAULT CHARSET=3Dutf8 COMMENT=3D'Spiele =
zwischen zwei oder
>> vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=3D46665 ;
>>
>>
>>
>> I want to get all matches for each athlete and calculate statistics =
such
>> as
>> number of matches etc.
>>
>> The basic very simplified query is like
>>
>>
>> SELECT a.id, COUNT(*)
>>
>> FROM athletes a
>>
>> LEFT JOIN matches m ON (
>> =A0 =A0 =A0 =A0 =A0 =A0m.team1_player_id =3D a.id OR
>> =A0 =A0 =A0 =A0 =A0 =A0m.team1_partner_id =3D a.id OR
>> =A0 =A0 =A0 =A0 =A0 =A0m.team2_player_id =3D a.id OR
>> =A0 =A0 =A0 =A0 =A0 =A0m.team2_partner_id =3D a.id
>>
>> )
>>
>> WHERE a.gender =3D 'female'
>>
>> GROUP BY a.id
>>
>>
>>
>> Now the problem is, that mysql uses a full table scan to retrieve the
>> matches for an athlete, so the execution takes many seconds or even
worse.
>>
>> An athlete can be referenced in any of the m.team1_player_id OR
>> m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That
>> allows
>> doubles matches.)
>>
>> Why is an full table scan necessary although there is an index on =
each of
>> these fields? So an index exists for each OR-part of the join
condition...
>>
>>
>> Here the execution plan:
>>
>>
>>
>> id
>> select_type
>> table
>> type
>> possible_keys
>> key
>> key_len
>> ref
>> rows
>> Extra
>>
>> 1
>>
>> SIMPLE
>>
>> a
>>
>> ref
>>
>> gender_index_idx
>>
>> gender_index_idx
>>
>> 1
>>
>> const
>>
>> 2193
>>
>> Using where; Using temporary; Using filesort
>>
>>
>> 1
>>
>> SIMPLE
>>
>> m
>>
>> ALL
>>
>> team1_player_id_idx,team1_partner_id_idx,team2_pla...
>>
>> NULL
>>
>> NULL
>>
>> NULL
>>
>> 46664
>>
>>
>>
>>
>> Joining on each fields like the following is very fast and uses the =
index
>> but of course doesn't give me the expected result.
>>
>>
>> FROM athletes a
>>
>> LEFT JOIN matches m ON (a.id =3D m.team1_player_id)
>>
>> LEFT JOIN matches m2 ON (a.id =3D m2.team2_player_id)
>>
>>
>>
>> Maybe I need to do a workaround using a UNION?
>>
>> But this doesn't help either: (It takes 76 seconds)
>>
>>
>> FROM athletes a
>>
>> LEFT JOIN (
>>
>> (
>>
>> SELECT team1_player_id AS player_id, teammatch_id, match_type,
>> team1_score,
>> team2_score, team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>> UNION
>> (
>>
>> SELECT team1_partner_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> UNION
>> (
>>
>> SELECT team2_player_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> UNION
>> (
>>
>> SELECT team2_partner_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> ) m ON (a.id =3D m.player_id)
>>
>>
>>
>>
>> I hope someone can help me with this.
>>
>> Thanks in advance.
>>
>>
>
> I think the problem here is that you have an ordered vector table =
(main1,
> partner1, main2, partner2) which requires you to optionally match on 4
> separate columns to make your JOIN condition because the athlete you =
want
to
> locate could be in any of those 4 places in the match record.
>
> What would be faster is to normalize your match data into two tables. =
The
> first table would be just match information (id, date, location, etc) =
and
> another table of match-athlete pairs (match_id, athlete_id)
>
> alternatively you would have to use a query like
> (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.member1
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.partner1
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.member2
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
> =A0ON ... matches.partner2
> )
>
> in order to use the indexes for the JOINS.
>
> --
> Shawn Green, MySQL Senior Support Engineer
> Sun Microsystems, Inc.
> Office: Blountville, TN
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: =A0 =
=A0http://lists.mysql.com/mysql?unsub=3Dmdykman@gmail.com
>
>
--=20
- michael dykman
- mdykman@gmail.com
May the Source be with you.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg
Re: Join with OR-condition and Indexes
am 06.01.2010 11:27:14 von Joerg Bruehe
Tobias, all,
Tobias Schultze wrote:
> Thanks for your answers.
>=20
> I found out, that MySQL behaves very strange in this situation.
> I think this is a bug or important missing feature.
I disagree.
>=20
> I would like to see how other DBMS behave in this situation, which =
I would
> think is a common problem - whenever you want to join one column of=
a table
> with several columns of another table.
Relational design theory will typically create a different schema tha=
n
you did, I don't think the "join with several columns" is such a comm=
on
task.
>=20
> MySQL uses an index_merge when I query for one specific athlete:
>=20
> [[...]]
>=20
> But it doesn't use the index merge when joining like I originally t=
ried,
> although logically it should be able to use it, shouldn't it?
The general problem with using an index is that it means you need to =
do
two logical accesses for one row, first descend the index hierarchy a=
nd
then follow the pointer into the data.
This is very efficient if you have few hits only, but it becomes
inefficient when there are many hits.
Example: In a table of all members of a typical army, it is inefficie=
nt
to use an index on a field "sex" to find all male members - the hit r=
ate
is so high that a scan of the base table is less effort.
I know of a system where the optimizer uses a threshold of 15%: If th=
e
expected number of hits in the index exceeds 15% of the table size, i=
t
will not use the index but do a base table scan.
In your case, the system would need to search 4 indexes or scan 1 bas=
e
table - it doesn't surprise me that the optimizer considers the base
table scan to be more efficient.
> [[...]]
>=20
> Then Michael Dykman said, MySQL is restricted to one index per tabl=
e per
> query. So I thought, maybe I can help MySQL when I add a compound i=
ndex for
> all players. So it could use the one index to resolve the join.
> ALTER TABLE `matches` ADD INDEX `players_idx` ( `team1_player_id` ,
> `team1_partner_id` , `team2_player_id` , `team2_partner_id` ) ;
Such a compound index cannot help you in your task:
An index can only be used if the value(s) for the leading column(s) (=
=3D
the one(s) with highest order in sorting) is/are given.
You cannot use such a compund index to search for "team1_partner_id" =
or
any of the "team2_*" values - or did you ever successfully use the or=
der
a phone book is sorted in (last name, first name) to search by first
name only?
>=20
> [[...]]
>=20
> Changing the database schema doesn't seem to be usefull. Regarding =
my schema
> is already normalized and over-normalization generally decreases
> performances. I guess I would then run into other problems.
With the task you mailed about, all (up to) four (4) participants are
considered equal for the join. In light of this task, I would not cal=
l
your schema "normalized".
There may be other tasks in your system where the participants are no=
t
considered equal, so your schema must consider all of them, possibly
weighted by the frequency of the tasks.
Regards,
Jörg
--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB1610=
28
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg