Complex SQL optimization vs. general-purpose language
am 04.02.2010 06:40:22 von Yang ZhangAny SQL rewriting gurus know how I might be able to optimize this
query? The schema:
mysql> show columns from transactionlog;
+---------------+------------------------------------------- +------+-----+---------+----------------+
| Field | Type | Null |
Key | Default | Extra |
+---------------+------------------------------------------- +------+-----+---------+----------------+
| id | int(11) | NO |
PRI | NULL | auto_increment |
| transactionid | varchar(10) | NO |
MUL | NULL | |
| queryid | tinyint(4) | NO |
| NULL | |
| tableid | varchar(30) | NO |
MUL | NULL | |
| tupleid | int(11) | NO |
| NULL | |
| querytype | enum('select','insert','delete','update') | NO |
| NULL | |
| schemaname | varchar(20) | YES |
| NULL | |
| partition | tinyint(3) unsigned | YES |
| NULL | |
+---------------+------------------------------------------- +------+-----+---------+----------------+
8 rows in set (0.04 sec)
The query:
select concat(weight, ' ', ids, '\n')
from (
select
tableid,
tupleid,
group_concat(id separator ' ') as ids,
(
select count(distinct transactionid)
from transactionlog
where transactionid in (
select transactionid
from transactionlog
where (tableid, tupleid, querytype) =
(t.tableid, t.tupleid, 'update')
group by transactionid
having count(*) > 0
)
) weight
from transactionlog t
group by tableid, tupleid
having weight > 0 and count(*) > 1
) u;
This is the output of EXPLAIN and mk-visual-explain:
+----+--------------------+----------------+-------+-------- -------+---------------+---------+-----------+------+------- -----------------------
----------------+
| id | select_type | table | type | possible_keys |
key | key_len | ref | rows | Extra
|
+----+--------------------+----------------+-------+-------- -------+---------------+---------+-----------+------+------- ---------------------------------------+
| 1 | PRIMARY |
NULL | NULL | NULL | 13 |
|
| 2 | DERIVED | t | ALL | NULL |
NULL | NULL | NULL | 68 | Using filesort
|
| 3 | DEPENDENT SUBQUERY | transactionlog | index | NULL |
transactionid | 12 | NULL | 68 | Using where; Using index
|
| 4 | DEPENDENT SUBQUERY | transactionlog | ref | tableid |
tableid | 36 | func,func | 2 | Using where; Using
temporary; Using filesort |
+----+--------------------+----------------+-------+-------- -------+---------------+---------+-----------+------+------- ---------------------------------------+
Table scan
rows 13
+- DERIVED
table derived(t,transactionlog,temporary(transactionlog))
+- DEPENDENT SUBQUERY
+- DEPENDENT SUBQUERY
| +- Filesort
| | +- TEMPORARY
| | table temporary(transactionlog)
| | +- Filter with WHERE
| | +- Bookmark lookup
| | +- Table
| | | table transactionlog
| | | possible_keys tableid
| | +- Index lookup
| | key transactionlog->tableid
| | possible_keys tableid
| | key_len 36
| | ref func,func
| | rows 2
| +- Filter with WHERE
| +- Index scan
| key transactionlog->transactionid
| key_len 12
| rows 68
+- Filesort
+- Table scan
rows 68
+- Table
table t
That is a lot of work. I can write the equivalent logic in Python
while making a single pass:
results = query("""
select tableid, tupleid, transactionid, id, querytype
from transactionlog_2warehouse
""")
_tab, _tup = None
ids = []
weight = 0
saw_upd = False
for tab, tup, txn, id, qt in results:
if (_tab, _tup) != (tab, tup):
if len(ids) > 1 and weight > 0:
print weight, ids
weight = 0
ids = []
_txn = None
if _txn != txn:
saw_upd = False
if qt == 'update' and not saw_upd:
weight += 1
saw_upd = True
ids += [id]
Is it possible to achieve the Python single-pass performance using
pure SQL? Thanks in advance!
--
Yang Zhang
http://www.mit.edu/~y_z/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org