optimizing UNIONs ?

optimizing UNIONs ?

am 10.06.2008 18:59:42 von Lucio Chiappetti

(mysql 5.0.27 on SuSE Linux)

I recently thought to use UNIONs to allow me to concatenate "vertically"
some database tables with statements like this

create or replace view combo as
(select * from nov06) union
(select *,ra_corr as ra_cor2,dec_corr as dec_cor2 from jul07) union
(select *,ra_corr as ra_cor2,dec_corr as dec_cor2 from subaru where
field<2100);

(a show create view tells me this was created as CREATE
ALGORITHM=UNDEFINED, and of course gives the full list of columns)

The sense of the above is that "nov06" is a first release of an
astronomical database containing a sky region, and the related catalogue
has already been published (so it shall be mantained unchanged).

"jul07" and "subaru" are two incremental additions in other sky regions,
which I'd like to see "all together" as an union ... and at the same time
to keep physically separate to ease maintenance.

The three tables in the union have the same layout, except that nov06 has
two columns more. To allow the unions these columns are "mimicked" as
identical copies of two other columns in the other two tables (per CREATE
statement above).

I do not expect the CREATE makes any difficulty ... it is shown as a full
list of columns.

All three tables have two indices, an UNIQUE one on two columns
(`field`,`id`), and another one on the single column "seq" (a sequence
number which is also auto_increment ... but the tables are static once
created). In particular seq runs sequentially from table to table so that
the first jul07.seq is equal to the last nov06.seq + 1 and so on.

So far so good ...

... those unions work nicely, only slightly slower than a single
table

............................................................ ..

In the past I had (and still have) also some views which allow
simultaneous "horizontal" access to more than one single table via a
glorified correlation table (just a table of pointers, I hope the
definition below illustrates the usage clearly enough)

create ALGORITHM=TEMPTABLE VIEW XLSS as
list of column aliases
from glorlss06 left join nov06 on glorlss06.nov06 =nov06.seq
left join nov06b on glorlss06.nov06b =nov06b.seq
left join nov06cd on glorlss06.nov06cd=nov06cd.seq ;

This is just an example with three "horizontal" members. I have more
complex examples with up to 30 members, and lived satisfactorily with
them.

(the ALGORITHM=TEMPTABLE and sometimes the usage of SELECT STRAIGHT_JOIN
are the only tricks required to improve efficiency, the latter was even
discussed on this list ... ah the glorlss06 of course have a couple of
indices, an unique one on (`seq`,`nov06`) and another on nov06 alone.

Just for reference this is example of EXPLAIN SELECT on such view

explain select Xcatname,Xseq,Xra,Xdec from XLSS where Xseq=13121;
+----+-------------+------------+------+---------------+---- -------+---------+-------------------------+------+--------- ----+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+---- -------+---------+-------------------------+------+--------- ----+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL
| NULL | 3385 | Using where |
| 2 | DERIVED | glorlss06 | ALL | NULL | NULL | NULL
| NULL | 3385 | |
| 2 | DERIVED | nov06 | ref | auxiliary | auxiliary | 4
| lssdb.glorlss06.nov06 | 16 | |
| 2 | DERIVED | nov06b | ref | auxiliary | auxiliary | 4
| lssdb.glorlss06.nov06b | 16 | |
| 2 | DERIVED | nov06cd | ref | auxiliary | auxiliary | 4
| lssdb.glorlss06.nov06cd | 16 | |
+----+-------------+------------+------+---------------+---- -------+---------+-------------------------+------+--------- ----+

In case this gets wrapped in the mail a copy can be seen at
http://sax.iasf-milano.inaf.it/~lucio/temp/Temp/example.txt

Such a statement (the select, not the explain) takes 0.19 seq

............................................................ ..

and now the trouble comes ... when I want to put everything together

I create an "horizontal" view whose members are "vertical" unions

create ALGORITHM=TEMPTABLE VIEW INTERIM as
list of column aliases
from glorcombo left join combo on glorcombo.combo =combo.seq
left join combob on glorcombo.combob =combob.seq
left join combocd on glorcombo.combocd=combocd.seq ;

combo with its three members was illustrated above, and combob and combocd
are fully equivalent unions with 3 members each. glorcombo is instead a
physical table.

A statement fully analogous to the previous one takes now 49 sec instead
of a fraction. All the time is spent in the analysis phase of EXPLAIN
select (which I report below, and, in case of wrap, at the URL given
above

explain select Xcatname,Xseq,Xra,Xdec,Xlssflag from INTERIM where
Xseq=13121;

+----+--------------+----------------+------+--------------- +------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+--------------+----------------+------+--------------- +------+---------+------+-------+-------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL
| NULL | 6391 | Using where |
| 2 | DERIVED | glorcombo | ALL | NULL | NULL | NULL
| NULL | 6391 | |
| 2 | DERIVED | | ALL | NULL | NULL | NULL
| NULL | 18652 | |
| 2 | DERIVED | | ALL | NULL | NULL | NULL
| NULL | 12303 | |
| 2 | DERIVED | | ALL | NULL | NULL | NULL
| NULL | 8921 | |
| 9 | DERIVED | nov06cd | ALL | NULL | NULL | NULL
| NULL | 5917 | |
| 10 | UNION | jul07cd | ALL | NULL | NULL | NULL
| NULL | 2185 | |
| 11 | UNION | subarucd | ALL | indice | NULL | NULL
| NULL | 1414 | Using where |
| NULL | UNION RESULT | | ALL | NULL | NULL |
NULL | NULL | NULL | |
| 6 | DERIVED | nov06b | ALL | NULL | NULL | NULL
| NULL | 7986 | |
| 7 | UNION | jul07b | ALL | NULL | NULL | NULL
| NULL | 3262 | |
| 8 | UNION | subarub | ALL | indice | NULL | NULL
| NULL | 2044 | Using where |
| NULL | UNION RESULT | | ALL | NULL | NULL |
NULL | NULL | NULL | |
| 3 | DERIVED | nov06 | ALL | NULL | NULL | NULL
| NULL | 12380 | |
| 4 | UNION | jul07 | ALL | NULL | NULL | NULL
| NULL | 4783 | |
| 5 | UNION | subaru | ALL | indice | NULL | NULL
| NULL | 2793 | Using where |
| NULL | UNION RESULT | | ALL | NULL | NULL |
NULL | NULL | NULL | |
+----+--------------+----------------+------+--------------- +------+---------+------+-------+-------------+

in this case replacing the select with a select straight_join has no
effect.

Is there any way to optimize this stuff (without making a physical copy of
the union) i.e. to force proper usage of the various indices present in
the individual tables ?

It shall be noted that the various member tables are all of comparable
length with just a few thousand records each.

--
------------------------------------------------------------ ------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------ ------------
Do not blame ME, I did NOT vote Berlusconi.
------------------------------------------------------------ ------------

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: optimizing UNIONs ?

am 11.06.2008 16:28:43 von Lucio Chiappetti

On Tue, 10 Jun 2008, Martin wrote:

> Lucio
>
> So the net effect is to produce a cartesian join where ALL results from
> Query1 (are combined with) ALL results from Query2
> In order to prevent cartesian join can you use a more narrowly defined
> predicate such as what is defined at
> http://www.mysqlperformanceblog.com/2007/04/06/using-delayed -join-to-optimize-count-and-limit-queries/

Thank you Martin for the suggestion to take the join out of the union.

It is not exactly what suggestedin the URL you quote (that's more "take
the WHERE condition inside a subquery") but gave me an idea which I
explain below

In my case the WHERE condition is totally unpredictable (all the stuff
below will actually be masked under a servlet which users will access
to construct queries)

> Are you sure it is wise to create temptable considering MERGE would be
> disabled when temptable is specified?

Apparently it was proven in the past that it was necessary to obtain
a speed achievement in some conditions specific of my case.

> Molte Grazie
> Martin

Prego ! (you are welcome)
But why are you thanking me, while I should be thanking you ? :-)

(this reminds me of the 8th scene of Petrolini's Nerone :-)
http://www.drzap.it/O_Petrolini_Nerone.htm
It's even on You Tube, google for "Petrolini Nerone grazie")

Now back to serious business

a) when I create views of the form

gct left join member1... left join member2... left join membern...

where the "member" tables are real tables
the queries are fast and use the indices on member1 ... membern

b) if for maintenance purposes I create unions which concatenate
three tables, say

member1 (combo1) is union of tab1A tab1B tab1C
member2 (combo2) is union of tab2A tab2B tab2C
member3 (combo3) is union of tab3A tab3B tab3C

while the union themselves are fast, the view above where each
member is an union is 250 times slower !

c) it is not a matter of join optimization (STRAIGHT_JOIN does NOT
help)

d) I found no way to force the union to use the concatenation of
the index. I tried index hints like

(select * from tab1A force index (auxiliary)) union
(select * from tab1B force index (auxiliary)) union
(select * from tab1C force index (auxiliary))

but this has no effect.

e) one obvious way out would be to store the unions into a physical
table, so member1/2/3 will be physical tables.
This is almost as fast as the single table query (scales with size)

but has the disadvantage to waste disk space (and to require one
remembers to update the physical union when one of the A B C components
are updated

f) the solution is to write the view as an union of joins (instead
of a join of union views)

create view xxx as
(select .. gct left join tab1A.. left join tab2A.. left join tab3A..)
union
(select .. gct left join tab1B.. left join tab2B.. left join tab3B..)
union
(select .. gct left join tab1C.. left join tab2C.. left join tab3C..)

This proves to be successful. Explain select returns a manageable
query which uses the indices on the tabnX, and executes only 2.5 slower
than the original query on a single table ...

... what's more important it remains fast even if one is accessing
element in each of the three union "chunks"

The updated notes on
http://sax.iasf-milano.inaf.it/~lucio/temp/Temp/example.txt
are probably clearer

One can even dispense with the "combo" unions, everything is
done on the fly from the physical tabnX.

The only thing is that writing the definition of the union of joins
is slightly painful (but can be automatized).

--
------------------------------------------------------------ ------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------ ------------
Multi pertransibunt et augebitur scientia
Francis Bacon Instauratio Magna (http://tinyurl.com/2j3qk5)
------------------------------------------------------------ ------------

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

using Unix soft links

am 09.02.2009 18:10:12 von Lucio Chiappetti

I used to make Unix soft links of mysql tables back on mysql 3.23.
Say that I do an arbitrary sequence of CREATE TABLE, LOAD LOCAL DATA
INFILE, INSERT, UPDATE etc. on a table "xxxxx".

At some stage (after the CREATE) I do (outside of mysql)

foreach i (MYD MYI frm)
ln -s xxxxx.$i xxxxxdup.$i

so that a soft-link clone of the table comes into existence.

Then I have cases where I do queries on xxxxx left join xxxxxdup (or on
some other table left join xxxxx left join xxxxxdup). I never perform any
update, insert or delete on xxxxxdup.

I know I could do a query on the same table using two aliases (xxxxxx as
a left join xxxxx as b) and this is in fact what I do when I perform
interactive operations using the mysql client.

I know also I could do a "create view xxxxxdup select * from xxxxx"
(although this was not possible back in 3.23, and I wonder whether using
such an xxxxxdup would be efficient index-wise). I do use views for other
purposes.

But there were good reasons at the time to use soft links (one was not
existence of the views, the other is that my users do access everything
from a java front end, which does not and shall not know that a
"duplicated clone" is special (so it can't use aliases).

The arrangement with soft links has been working fine for years, under
mysql 3.23, 4.x and including at least a couple of years with current
mysql 5.0.27.

However today one of my "dup" tables was indicated as corrupted. I did a
"repair table xxxxxdup" and it said the size of the table was shrunk from
1469 to 213 records. I did some sequence of "repair table" on xxxxx and
xxxxdup, and the outcome seemed unstable. Now both tables show the (WRONG)
number of 213 records.

I will probably delete all tables and recreate them.

But I wonder if there is anything intrinsically wrong in the usage of
soft-links, or whether this might be an indication instead of hardware
problems with our machine or disks !

--
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------ -----------
"Nature" on government cuts to research http://snipurl.com/4erid
"Nature" e i tagli del governo alla ricerca http://snipurl.com/4erko

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: using Unix soft links

am 10.02.2009 10:06:23 von Johan De Meersman

--0015174bf1f029528c04628ccdb1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On Mon, Feb 9, 2009 at 6:10 PM, Lucio Chiappetti wrote:

> I used to make Unix soft links of mysql tables back on mysql 3.23.
>
> [...]
>

> But I wonder if there is anything intrinsically wrong in the usage of
> soft-links, or whether this might be an indication instead of hardware
> problems with our machine or disks !



In short, the mysql server will cache the metadata of the tables, and, thus,
for xxx and xxxdup separately. Any modification to xxx will not immediately
come throught in xxxdup, but depending on the updates to xxx, you might
suddenly have to read data from the file and end up with not quite what you
expected.

Thus, you see 'corruption' in xxxdup. You run a repair on it, which for some
reason modifies the file, so now the cached metadata for xxx is incorrect.

Rinse and repeat until all your data is fucked over.


You really want to do this with views, or triggers to update the duplicate
table, or, if all else fails, scheduled 'create table as select' statements.


--
Celsius is based on water temperature.
Fahrenheit is based on alcohol temperature.
Ergo, Fahrenheit is better than Celsius. QED.

--0015174bf1f029528c04628ccdb1--