optimizing UNIONs ?
am 10.06.2008 18:59:42 von Lucio Chiappetti(mysql 5.0.27 on SuSE Linux)
I recently thought to use UNIONs to allow me to concatenate "vertically"
some database tables with statements like this
create or replace view combo as
(select * from nov06) union
(select *,ra_corr as ra_cor2,dec_corr as dec_cor2 from jul07) union
(select *,ra_corr as ra_cor2,dec_corr as dec_cor2 from subaru where
field<2100);
(a show create view tells me this was created as CREATE
ALGORITHM=UNDEFINED, and of course gives the full list of columns)
The sense of the above is that "nov06" is a first release of an
astronomical database containing a sky region, and the related catalogue
has already been published (so it shall be mantained unchanged).
"jul07" and "subaru" are two incremental additions in other sky regions,
which I'd like to see "all together" as an union ... and at the same time
to keep physically separate to ease maintenance.
The three tables in the union have the same layout, except that nov06 has
two columns more. To allow the unions these columns are "mimicked" as
identical copies of two other columns in the other two tables (per CREATE
statement above).
I do not expect the CREATE makes any difficulty ... it is shown as a full
list of columns.
All three tables have two indices, an UNIQUE one on two columns
(`field`,`id`), and another one on the single column "seq" (a sequence
number which is also auto_increment ... but the tables are static once
created). In particular seq runs sequentially from table to table so that
the first jul07.seq is equal to the last nov06.seq + 1 and so on.
So far so good ...
... those unions work nicely, only slightly slower than a single
table
............................................................ ..
In the past I had (and still have) also some views which allow
simultaneous "horizontal" access to more than one single table via a
glorified correlation table (just a table of pointers, I hope the
definition below illustrates the usage clearly enough)
create ALGORITHM=TEMPTABLE VIEW XLSS as
list of column aliases
from glorlss06 left join nov06 on glorlss06.nov06 =nov06.seq
left join nov06b on glorlss06.nov06b =nov06b.seq
left join nov06cd on glorlss06.nov06cd=nov06cd.seq ;
This is just an example with three "horizontal" members. I have more
complex examples with up to 30 members, and lived satisfactorily with
them.
(the ALGORITHM=TEMPTABLE and sometimes the usage of SELECT STRAIGHT_JOIN
are the only tricks required to improve efficiency, the latter was even
discussed on this list ... ah the glorlss06 of course have a couple of
indices, an unique one on (`seq`,`nov06`) and another on nov06 alone.
Just for reference this is example of EXPLAIN SELECT on such view
explain select Xcatname,Xseq,Xra,Xdec from XLSS where Xseq=13121;
+----+-------------+------------+------+---------------+---- -------+---------+-------------------------+------+--------- ----+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+---- -------+---------+-------------------------+------+--------- ----+
| 1 | PRIMARY |
| NULL | 3385 | Using where |
| 2 | DERIVED | glorlss06 | ALL | NULL | NULL | NULL
| NULL | 3385 | |
| 2 | DERIVED | nov06 | ref | auxiliary | auxiliary | 4
| lssdb.glorlss06.nov06 | 16 | |
| 2 | DERIVED | nov06b | ref | auxiliary | auxiliary | 4
| lssdb.glorlss06.nov06b | 16 | |
| 2 | DERIVED | nov06cd | ref | auxiliary | auxiliary | 4
| lssdb.glorlss06.nov06cd | 16 | |
+----+-------------+------------+------+---------------+---- -------+---------+-------------------------+------+--------- ----+
In case this gets wrapped in the mail a copy can be seen at
http://sax.iasf-milano.inaf.it/~lucio/temp/Temp/example.txt
Such a statement (the select, not the explain) takes 0.19 seq
............................................................ ..
and now the trouble comes ... when I want to put everything together
I create an "horizontal" view whose members are "vertical" unions
create ALGORITHM=TEMPTABLE VIEW INTERIM as
list of column aliases
from glorcombo left join combo on glorcombo.combo =combo.seq
left join combob on glorcombo.combob =combob.seq
left join combocd on glorcombo.combocd=combocd.seq ;
combo with its three members was illustrated above, and combob and combocd
are fully equivalent unions with 3 members each. glorcombo is instead a
physical table.
A statement fully analogous to the previous one takes now 49 sec instead
of a fraction. All the time is spent in the analysis phase of EXPLAIN
select (which I report below, and, in case of wrap, at the URL given
above
explain select Xcatname,Xseq,Xra,Xdec,Xlssflag from INTERIM where
Xseq=13121;
+----+--------------+----------------+------+--------------- +------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+--------------+----------------+------+--------------- +------+---------+------+-------+-------------+
| 1 | PRIMARY |
| NULL | 6391 | Using where |
| 2 | DERIVED | glorcombo | ALL | NULL | NULL | NULL
| NULL | 6391 | |
| 2 | DERIVED |
| NULL | 18652 | |
| 2 | DERIVED |
| NULL | 12303 | |
| 2 | DERIVED |
| NULL | 8921 | |
| 9 | DERIVED | nov06cd | ALL | NULL | NULL | NULL
| NULL | 5917 | |
| 10 | UNION | jul07cd | ALL | NULL | NULL | NULL
| NULL | 2185 | |
| 11 | UNION | subarucd | ALL | indice | NULL | NULL
| NULL | 1414 | Using where |
| NULL | UNION RESULT |
NULL | NULL | NULL | |
| 6 | DERIVED | nov06b | ALL | NULL | NULL | NULL
| NULL | 7986 | |
| 7 | UNION | jul07b | ALL | NULL | NULL | NULL
| NULL | 3262 | |
| 8 | UNION | subarub | ALL | indice | NULL | NULL
| NULL | 2044 | Using where |
| NULL | UNION RESULT |
NULL | NULL | NULL | |
| 3 | DERIVED | nov06 | ALL | NULL | NULL | NULL
| NULL | 12380 | |
| 4 | UNION | jul07 | ALL | NULL | NULL | NULL
| NULL | 4783 | |
| 5 | UNION | subaru | ALL | indice | NULL | NULL
| NULL | 2793 | Using where |
| NULL | UNION RESULT |
NULL | NULL | NULL | |
+----+--------------+----------------+------+--------------- +------+---------+------+-------+-------------+
in this case replacing the select with a select straight_join has no
effect.
Is there any way to optimize this stuff (without making a physical copy of
the union) i.e. to force proper usage of the various indices present in
the individual tables ?
It shall be noted that the various member tables are all of comparable
length with just a few thousand records each.
--
------------------------------------------------------------ ------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------ ------------
Do not blame ME, I did NOT vote Berlusconi.
------------------------------------------------------------ ------------
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org