How to optimize a slow query?
How to optimize a slow query?
am 05.09.2009 17:05:26 von chen jia
Hi there,
One simple query took more than 10 minutes. Here is how relevant rows
in the slow query log looks like:
# Time: 090905 10:49:57
# User@Host: root[root] @ localhost []
# Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
use world;
create table rmpdata1 select ri.*,
mv.MV, coalesce(ri.code,mv.code) as ccode,
coalesce(ri.ndate,mv.ndate) as cndate
from RItime as ri left outer join
MVtime as mv
on (ri.code=mv.code and
ri.ndate=mv.ndate);
When I "explain" only the select clause, I get
------------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len
| ref | rows | Extra |
+----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
| 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
| NULL | 13419851 | |
| 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
| world.ri.code,world.ri.ndate | 1 | |
+----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
2 rows in set (0.00 sec)
I use "show table status from world;" to get information about two
tables, RItime and MVtime, in the join clause:
Name: RItime
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 13419851
Avg_row_length: 31
Data_length: 427721848
Max_data_length: 281474976710655
Index_length: 347497472
Data_free: 0
Auto_increment: NULL
Create_time: 2009-09-03 10:17:57
Update_time: 2009-09-03 12:04:02
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
*************************** 2. row ***************************
Name: MVtime
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 13562373
Avg_row_length: 31
Data_length: 430220056
Max_data_length: 281474976710655
Index_length: 350996480
Data_free: 0
Auto_increment: NULL
Create_time: 2009-09-03 13:31:33
Update_time: 2009-09-03 13:43:51
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
I also describe these two tables:
mysql> desc RItime;
+-------+------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+------------+-------+
| code | varchar(6) | NO | PRI | | |
| ndate | date | NO | PRI | 0000-00-00 | |
| ri | double | YES | | NULL | |
| time | date | YES | | NULL | |
| bdate | date | YES | | NULL | |
+-------+------------+------+-----+------------+-------+
5 rows in set (0.00 sec)
mysql> desc MVtime;
+-------+------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+------------+-------+
| code | varchar(6) | NO | PRI | | |
| ndate | date | NO | PRI | 0000-00-00 | |
| MV | double | YES | | NULL | |
| time | date | YES | | NULL | |
| bdate | date | YES | | NULL | |
+-------+------------+------+-----+------------+-------+
5 rows in set (0.00 sec)
Could you give me some hint on how to improve the speed of this query?
Thanks.
Best,
Jia
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: How to optimize a slow query?
am 06.09.2009 07:07:55 von mos
How many rows were added to rmpdata1 table? If it is 13.4 million rows then
it is going to take several minutes to join this many rows from the 2 tables.
Is there a 1:1 relationship between the two tables or a 1:Many? If there is
a 1:1 then I'd recommend joining the two tables into 1 table so you don't
have to join them in the first place.
The only other thing I can suggest is to change the type of index on the
tables being joined to see if that makes a speed difference. For example,
if you are using BTREE then switch to HASH or vice versa. See
http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info.
Mike
At 10:05 AM 9/5/2009, Jia Chen wrote:
>Hi there,
>
>One simple query took more than 10 minutes. Here is how relevant rows in
>the slow query log looks like:
>
># Time: 090905 10:49:57
># User@Host: root[root] @ localhost []
># Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
>use world;
>create table rmpdata1 select ri.*,
>mv.MV, coalesce(ri.code,mv.code) as ccode,
>coalesce(ri.ndate,mv.ndate) as cndate
>from RItime as ri left outer join
>MVtime as mv
>on (ri.code=mv.code and
>ri.ndate=mv.ndate);
>
>When I "explain" only the select clause, I get
>------------+----------+-------+
>| id | select_type | table | type | possible_keys | key | key_len |
>ref | rows | Extra |
>+----+-------------+-------+--------+---------------+------ ---+---------+------------------------------+----------+---- ---+
>| 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>| NULL | 13419851 | |
>| 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>| world.ri.code,world.ri.ndate | 1 | |
>+----+-------------+-------+--------+---------------+------ ---+---------+------------------------------+----------+---- ---+
>2 rows in set (0.00 sec)
>
>I use "show table status from world;" to get information about two tables,
>RItime and MVtime, in the join clause:
> Name: RItime
> Engine: MyISAM
> Version: 10
> Row_format: Dynamic
> Rows: 13419851
>Avg_row_length: 31
> Data_length: 427721848
>Max_data_length: 281474976710655
> Index_length: 347497472
> Data_free: 0
>Auto_increment: NULL
> Create_time: 2009-09-03 10:17:57
> Update_time: 2009-09-03 12:04:02
> Check_time: NULL
> Collation: latin1_swedish_ci
> Checksum: NULL
>Create_options:
> Comment:
>*************************** 2. row ***************************
> Name: MVtime
> Engine: MyISAM
> Version: 10
> Row_format: Dynamic
> Rows: 13562373
>Avg_row_length: 31
> Data_length: 430220056
>Max_data_length: 281474976710655
> Index_length: 350996480
> Data_free: 0
>Auto_increment: NULL
> Create_time: 2009-09-03 13:31:33
> Update_time: 2009-09-03 13:43:51
> Check_time: NULL
> Collation: latin1_swedish_ci
> Checksum: NULL
>Create_options:
> Comment:
>
>I also describe these two tables:
>mysql> desc RItime;
>+-------+------------+------+-----+------------+-------+
>| Field | Type | Null | Key | Default | Extra |
>+-------+------------+------+-----+------------+-------+
>| code | varchar(6) | NO | PRI | | |
>| ndate | date | NO | PRI | 0000-00-00 | |
>| ri | double | YES | | NULL | |
>| time | date | YES | | NULL | |
>| bdate | date | YES | | NULL | |
>+-------+------------+------+-----+------------+-------+
>5 rows in set (0.00 sec)
>
>mysql> desc MVtime;
>+-------+------------+------+-----+------------+-------+
>| Field | Type | Null | Key | Default | Extra |
>+-------+------------+------+-----+------------+-------+
>| code | varchar(6) | NO | PRI | | |
>| ndate | date | NO | PRI | 0000-00-00 | |
>| MV | double | YES | | NULL | |
>| time | date | YES | | NULL | |
>| bdate | date | YES | | NULL | |
>+-------+------------+------+-----+------------+-------+
>5 rows in set (0.00 sec)
>
>Could you give me some hint on how to improve the speed of this query?
>Thanks.
>
>Best,
>Jia
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: How to optimize a slow query?
am 06.09.2009 16:45:30 von chen jia
Thanks for your reply, Mike.
Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to
be too long. I run the same join by using SQL procedure in a statistical
software called SAS on a similar machine. It only takes 1 minute and 3
seconds.
Yes, it is a 1:1 relationship between table RItime and MVtime. However,
I don't get your suggestion, "I'd recommend joining the two tables into
1 table so you don't have to join them in the first place." Could you
elaborate that?
I was using B-tree index. Switching to HASH does help. Now, it takes 4
min 50.17 sec to run the query.
I also turn on profiling by using
mysql> set profiling = 1;
Query OK, 0 rows affected (0.01 sec)
After the query finishes, I get
mysql> show profile;
+----------------------+------------+
| Status | Duration |
+----------------------+------------+
| starting | 0.000123 |
| checking permissions | 0.000010 |
| Opening tables | 0.000044 |
| System lock | 0.000007 |
| Table lock | 0.000011 |
| init | 0.000083 |
| creating table | 0.003428 |
| After create | 0.000124 |
| System lock | 0.000004 |
| Table lock | 0.000051 |
| optimizing | 0.000007 |
| statistics | 0.000033 |
| preparing | 0.000020 |
| executing | 0.000004 |
| Sending data | 290.153530 |
| end | 0.000008 |
| end | 0.000004 |
| query end | 0.000003 |
| freeing items | 0.000010 |
| closing tables | 0.000025 |
| logging slow query | 0.000001 |
| logging slow query | 0.013429 |
| cleaning up | 0.000004 |
+----------------------+------------+
23 rows in set (0.02 sec)
MySQL spends most of its time sending data. According to
http://dev.mysql.com/doc/refman/5.0/en/general-thread-states .html,
sending data means that "the thread is processing rows for a |SELECT|
statement and also
is sending data to the client." Is there more room to optimize this
query? Thanks again.
Best,
Jia
mos wrote:
> How many rows were added to rmpdata1 table? If it is 13.4 million rows
> then it is going to take several minutes to join this many rows from
> the 2 tables.
> Is there a 1:1 relationship between the two tables or a 1:Many? If
> there is a 1:1 then I'd recommend joining the two tables into 1 table
> so you don't have to join them in the first place.
> The only other thing I can suggest is to change the type of index on
> the tables being joined to see if that makes a speed difference. For
> example, if you are using BTREE then switch to HASH or vice versa. See
> http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info.
>
> Mike
>
> At 10:05 AM 9/5/2009, Jia Chen wrote:
>> Hi there,
>>
>> One simple query took more than 10 minutes. Here is how relevant
>> rows in the slow query log looks like:
>>
>> # Time: 090905 10:49:57
>> # User@Host: root[root] @ localhost []
>> # Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
>> use world;
>> create table rmpdata1 select ri.*,
>> mv.MV, coalesce(ri.code,mv.code) as ccode,
>> coalesce(ri.ndate,mv.ndate) as cndate
>> from RItime as ri left outer join
>> MVtime as mv
>> on (ri.code=mv.code and
>> ri.ndate=mv.ndate);
>>
>> When I "explain" only the select clause, I get
>> ------------+----------+-------+
>> | id | select_type | table | type | possible_keys | key |
>> key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
>>
>> | 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>> | NULL | 13419851 | |
>> | 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>> | world.ri.code,world.ri.ndate | 1 | |
>> +----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
>>
>> 2 rows in set (0.00 sec)
>>
>> I use "show table status from world;" to get information about two
>> tables, RItime and MVtime, in the join clause:
>> Name: RItime
>> Engine: MyISAM
>> Version: 10
>> Row_format: Dynamic
>> Rows: 13419851
>> Avg_row_length: 31
>> Data_length: 427721848
>> Max_data_length: 281474976710655
>> Index_length: 347497472
>> Data_free: 0
>> Auto_increment: NULL
>> Create_time: 2009-09-03 10:17:57
>> Update_time: 2009-09-03 12:04:02
>> Check_time: NULL
>> Collation: latin1_swedish_ci
>> Checksum: NULL
>> Create_options:
>> Comment:
>> *************************** 2. row ***************************
>> Name: MVtime
>> Engine: MyISAM
>> Version: 10
>> Row_format: Dynamic
>> Rows: 13562373
>> Avg_row_length: 31
>> Data_length: 430220056
>> Max_data_length: 281474976710655
>> Index_length: 350996480
>> Data_free: 0
>> Auto_increment: NULL
>> Create_time: 2009-09-03 13:31:33
>> Update_time: 2009-09-03 13:43:51
>> Check_time: NULL
>> Collation: latin1_swedish_ci
>> Checksum: NULL
>> Create_options:
>> Comment:
>>
>> I also describe these two tables:
>> mysql> desc RItime;
>> +-------+------------+------+-----+------------+-------+
>> | Field | Type | Null | Key | Default | Extra |
>> +-------+------------+------+-----+------------+-------+
>> | code | varchar(6) | NO | PRI | | |
>> | ndate | date | NO | PRI | 0000-00-00 | |
>> | ri | double | YES | | NULL | |
>> | time | date | YES | | NULL | |
>> | bdate | date | YES | | NULL | |
>> +-------+------------+------+-----+------------+-------+
>> 5 rows in set (0.00 sec)
>>
>> mysql> desc MVtime;
>> +-------+------------+------+-----+------------+-------+
>> | Field | Type | Null | Key | Default | Extra |
>> +-------+------------+------+-----+------------+-------+
>> | code | varchar(6) | NO | PRI | | |
>> | ndate | date | NO | PRI | 0000-00-00 | |
>> | MV | double | YES | | NULL | |
>> | time | date | YES | | NULL | |
>> | bdate | date | YES | | NULL | |
>> +-------+------------+------+-----+------------+-------+
>> 5 rows in set (0.00 sec)
>>
>> Could you give me some hint on how to improve the speed of this query?
>> Thanks.
>>
>> Best,
>> Jia
>>
>>
>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
>
>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: How to optimize a slow query?
am 06.09.2009 19:12:49 von mos
Jia,
>Yes, it is a 1:1 relationship between table RItime and MVtime. However, I
>don't get your suggestion, "I'd recommend joining the two tables into 1
>table so you don't have to join them in the first place." Could you
>elaborate that?
Sure but first I have to relate it to my own experience. I had 8 tables of
around 25 million rows each. They all had a 1:1 relationship and
occasionally some of the tables did not have a corresponding row. I felt it
was better from a design point of view to have 8 different tables and do
the joins on the tables that I needed for each of my particular queries.
I'd be joining anywhere from 2 to 5 or 6 or even all 8 tables at a time,
using a where clause to select 15k rows at a time. This is the way to do it
from a normalized point of view. All of the information is in its
respective table and only assemble the tables for each particular query.
Well, this was sloooowww! A heck of a lot of work was done to join the
tables together on a 2 column key (like yours). I also had to run
maintenance on the tables to see which tables where corrupted or were
missing rows that should have been there. The tables also repeated columns
from the other tables like date and product_id that is used to help
identify each row. Well to make a long story short, it was far too much
effort to juggle the relationships between all of these tables.
Then a colleague made the monumental announcement by saying "I've never
found the need to use more than 1 table when there was a 1:1 relationship.
There is a tremendous speed cost involved in piecing the data back
together. I put all of the data into 1 table". So the light went on for
me. From then on I've merged all 8 tables into one and if any of the
subordinate table data isn't available for a row, its columns are set to
NULL, which is the values they would have had anyway after a left join.
I am perfectly happy know with one wide table with over 100 columns.
Everything is in its place and maintenance is a dream. Queries are also
quite fast because all of the information is under one table and not 8. I
don't have to worry about optimizing the indexes for the table joins
because there aren't any joins between these tables because it is all in 1 row.
So you really have to ask yourself, why spend 10 minutes each time your
query is run? Instead you eliminate the query altogether by keeping the
data of the 2 tables into 1 table in the first place.
Mike
At 09:45 AM 9/6/2009, Jia Chen wrote:
>Thanks for your reply, Mike.
>
>Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to be
>too long. I run the same join by using SQL procedure in a statistical
>software called SAS on a similar machine. It only takes 1 minute and 3 seconds.
>
>Yes, it is a 1:1 relationship between table RItime and MVtime. However, I
>don't get your suggestion, "I'd recommend joining the two tables into 1
>table so you don't have to join them in the first place." Could you
>elaborate that?
>
>I was using B-tree index. Switching to HASH does help. Now, it takes 4 min
>50.17 sec to run the query.
>I also turn on profiling by using
>mysql> set profiling = 1;
>Query OK, 0 rows affected (0.01 sec)
>
>After the query finishes, I get
>mysql> show profile;
>+----------------------+------------+
>| Status | Duration |
>+----------------------+------------+
>| starting | 0.000123 |
>| checking permissions | 0.000010 |
>| Opening tables | 0.000044 |
>| System lock | 0.000007 |
>| Table lock | 0.000011 |
>| init | 0.000083 |
>| creating table | 0.003428 |
>| After create | 0.000124 |
>| System lock | 0.000004 |
>| Table lock | 0.000051 |
>| optimizing | 0.000007 |
>| statistics | 0.000033 |
>| preparing | 0.000020 |
>| executing | 0.000004 |
>| Sending data | 290.153530 |
>| end | 0.000008 |
>| end | 0.000004 |
>| query end | 0.000003 |
>| freeing items | 0.000010 |
>| closing tables | 0.000025 |
>| logging slow query | 0.000001 |
>| logging slow query | 0.013429 |
>| cleaning up | 0.000004 |
>+----------------------+------------+
>23 rows in set (0.02 sec)
>
>MySQL spends most of its time sending data. According to
>http://dev.mysql.com/doc/refman/5.0/en/general-thread-state s.html, sending
>data means that "the thread is processing rows for a |SELECT|
> statement and also is
>sending data to the client." Is there more room to optimize this
>query? Thanks again.
>
>Best,
>Jia
>
>mos wrote:
>>How many rows were added to rmpdata1 table? If it is 13.4 million rows
>>then it is going to take several minutes to join this many rows from the
>>2 tables.
>>Is there a 1:1 relationship between the two tables or a 1:Many? If there
>>is a 1:1 then I'd recommend joining the two tables into 1 table so you
>>don't have to join them in the first place.
>>The only other thing I can suggest is to change the type of index on the
>>tables being joined to see if that makes a speed difference. For example,
>>if you are using BTREE then switch to HASH or vice versa. See
>>http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info.
>>
>>Mike
>>
>>At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>Hi there,
>>>
>>>One simple query took more than 10 minutes. Here is how relevant rows
>>>in the slow query log looks like:
>>>
>>># Time: 090905 10:49:57
>>># User@Host: root[root] @ localhost []
>>># Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
>>>use world;
>>>create table rmpdata1 select ri.*,
>>>mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>coalesce(ri.ndate,mv.ndate) as cndate
>>>from RItime as ri left outer join
>>>MVtime as mv
>>>on (ri.code=mv.code and
>>>ri.ndate=mv.ndate);
>>>
>>>When I "explain" only the select clause, I get
>>>------------+----------+-------+
>>>| id | select_type | table | type | possible_keys | key | key_len
>>>| ref | rows | Extra |
>>>+----+-------------+-------+--------+---------------+---- -----+---------+------------------------------+----------+-- -----+
>>>
>>>| 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>>>| NULL | 13419851 | |
>>>| 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>>>| world.ri.code,world.ri.ndate | 1 | |
>>>+----+-------------+-------+--------+---------------+---- -----+---------+------------------------------+----------+-- -----+
>>>
>>>2 rows in set (0.00 sec)
>>>
>>>I use "show table status from world;" to get information about two
>>>tables, RItime and MVtime, in the join clause:
>>> Name: RItime
>>> Engine: MyISAM
>>> Version: 10
>>> Row_format: Dynamic
>>> Rows: 13419851
>>>Avg_row_length: 31
>>> Data_length: 427721848
>>>Max_data_length: 281474976710655
>>> Index_length: 347497472
>>> Data_free: 0
>>>Auto_increment: NULL
>>> Create_time: 2009-09-03 10:17:57
>>> Update_time: 2009-09-03 12:04:02
>>> Check_time: NULL
>>> Collation: latin1_swedish_ci
>>> Checksum: NULL
>>>Create_options:
>>> Comment:
>>>*************************** 2. row ***************************
>>> Name: MVtime
>>> Engine: MyISAM
>>> Version: 10
>>> Row_format: Dynamic
>>> Rows: 13562373
>>>Avg_row_length: 31
>>> Data_length: 430220056
>>>Max_data_length: 281474976710655
>>> Index_length: 350996480
>>> Data_free: 0
>>>Auto_increment: NULL
>>> Create_time: 2009-09-03 13:31:33
>>> Update_time: 2009-09-03 13:43:51
>>> Check_time: NULL
>>> Collation: latin1_swedish_ci
>>> Checksum: NULL
>>>Create_options:
>>> Comment:
>>>
>>>I also describe these two tables:
>>>mysql> desc RItime;
>>>+-------+------------+------+-----+------------+-------+
>>>| Field | Type | Null | Key | Default | Extra |
>>>+-------+------------+------+-----+------------+-------+
>>>| code | varchar(6) | NO | PRI | | |
>>>| ndate | date | NO | PRI | 0000-00-00 | |
>>>| ri | double | YES | | NULL | |
>>>| time | date | YES | | NULL | |
>>>| bdate | date | YES | | NULL | |
>>>+-------+------------+------+-----+------------+-------+
>>>5 rows in set (0.00 sec)
>>>
>>>mysql> desc MVtime;
>>>+-------+------------+------+-----+------------+-------+
>>>| Field | Type | Null | Key | Default | Extra |
>>>+-------+------------+------+-----+------------+-------+
>>>| code | varchar(6) | NO | PRI | | |
>>>| ndate | date | NO | PRI | 0000-00-00 | |
>>>| MV | double | YES | | NULL | |
>>>| time | date | YES | | NULL | |
>>>| bdate | date | YES | | NULL | |
>>>+-------+------------+------+-----+------------+-------+
>>>5 rows in set (0.00 sec)
>>>
>>>Could you give me some hint on how to improve the speed of this query?
>>>Thanks.
>>>
>>>Best,
>>>Jia
>>>
>>>
>>>
>>>
>>>--
>>>MySQL General Mailing List
>>>For list archives: http://lists.mysql.com/mysql
>>>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
>>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: How to optimize a slow query?
am 06.09.2009 20:51:11 von chen jia
Hi Mike,
Thanks for your detailed answer. Now, I understand what you mean. And,
yes, I agree with you that keeping all data in one table works better
for a bunch of 1:1 relationship tables. Actually, this is what I was
trying to do with that query.
Since you mention "They all had a 1:1 relationship and occasionally some
of the tables did not have a corresponding row." and "From then on I've
merged all 8 tables into one and if any of the subordinate table data
isn't available for a row, its columns are set to NULL", I do want to
ask you about how you set the columns to NULL for rows in subordinate
table data unavailable because I have similar situation.
If I want to combine two tables into one, I think that a full outer join
can achieve what you did. However, MySQL does not have full outer join.
So, I am using
create table rmpdata1
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
coalesce(ri.ndate,mv.ndate) as cndate
from
RItime as ri left outer join MVtime as mv
on
(ri.code=mv.code and ri.ndate=mv.ndate))
union
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
coalesce(ri.ndate,mv.ndate) as cndate
from
RItime as ri right outer join MVtime as mv
on
(ri.code=mv.code and ri.ndate=mv.ndate));
This query takes more than twice as much time as the query in my first
e-mail. Do you have a better way? Thanks.
Best,
Jia
mos wrote:
> Jia,
>
>> Yes, it is a 1:1 relationship between table RItime and MVtime.
>> However, I don't get your suggestion, "I'd recommend joining the two
>> tables into 1 table so you don't have to join them in the first
>> place." Could you elaborate that?
>
> Sure but first I have to relate it to my own experience. I had 8
> tables of around 25 million rows each. They all had a 1:1 relationship
> and occasionally some of the tables did not have a corresponding row.
> I felt it was better from a design point of view to have 8 different
> tables and do the joins on the tables that I needed for each of my
> particular queries. I'd be joining anywhere from 2 to 5 or 6 or even
> all 8 tables at a time, using a where clause to select 15k rows at a
> time. This is the way to do it from a normalized point of view. All of
> the information is in its respective table and only assemble the
> tables for each particular query.
>
> Well, this was sloooowww! A heck of a lot of work was done to join the
> tables together on a 2 column key (like yours). I also had to run
> maintenance on the tables to see which tables where corrupted or were
> missing rows that should have been there. The tables also repeated
> columns from the other tables like date and product_id that is used to
> help identify each row. Well to make a long story short, it was far
> too much effort to juggle the relationships between all of these tables.
>
> Then a colleague made the monumental announcement by saying "I've
> never found the need to use more than 1 table when there was a 1:1
> relationship. There is a tremendous speed cost involved in piecing the
> data back together. I put all of the data into 1 table". So the light
> went on for me. From then on I've merged all 8 tables into one and if
> any of the subordinate table data isn't available for a row, its
> columns are set to NULL, which is the values they would have had
> anyway after a left join.
>
> I am perfectly happy know with one wide table with over 100 columns.
> Everything is in its place and maintenance is a dream. Queries are
> also quite fast because all of the information is under one table and
> not 8. I don't have to worry about optimizing the indexes for the
> table joins because there aren't any joins between these tables
> because it is all in 1 row.
>
> So you really have to ask yourself, why spend 10 minutes each time
> your query is run? Instead you eliminate the query altogether by
> keeping the data of the 2 tables into 1 table in the first place.
>
> Mike
>
>
>
> At 09:45 AM 9/6/2009, Jia Chen wrote:
>> Thanks for your reply, Mike.
>>
>> Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem
>> to be too long. I run the same join by using SQL procedure in a
>> statistical software called SAS on a similar machine. It only takes 1
>> minute and 3 seconds.
>>
>> Yes, it is a 1:1 relationship between table RItime and MVtime.
>> However, I don't get your suggestion, "I'd recommend joining the two
>> tables into 1 table so you don't have to join them in the first
>> place." Could you elaborate that?
>>
>> I was using B-tree index. Switching to HASH does help. Now, it takes
>> 4 min 50.17 sec to run the query.
>> I also turn on profiling by using
>> mysql> set profiling = 1;
>> Query OK, 0 rows affected (0.01 sec)
>>
>> After the query finishes, I get
>> mysql> show profile;
>> +----------------------+------------+
>> | Status | Duration |
>> +----------------------+------------+
>> | starting | 0.000123 |
>> | checking permissions | 0.000010 |
>> | Opening tables | 0.000044 |
>> | System lock | 0.000007 |
>> | Table lock | 0.000011 |
>> | init | 0.000083 |
>> | creating table | 0.003428 |
>> | After create | 0.000124 |
>> | System lock | 0.000004 |
>> | Table lock | 0.000051 |
>> | optimizing | 0.000007 |
>> | statistics | 0.000033 |
>> | preparing | 0.000020 |
>> | executing | 0.000004 |
>> | Sending data | 290.153530 |
>> | end | 0.000008 |
>> | end | 0.000004 |
>> | query end | 0.000003 |
>> | freeing items | 0.000010 |
>> | closing tables | 0.000025 |
>> | logging slow query | 0.000001 |
>> | logging slow query | 0.013429 |
>> | cleaning up | 0.000004 |
>> +----------------------+------------+
>> 23 rows in set (0.02 sec)
>>
>> MySQL spends most of its time sending data. According to
>> http://dev.mysql.com/doc/refman/5.0/en/general-thread-states .html,
>> sending data means that "the thread is processing rows for a |SELECT|
>> statement and
>> also is sending data to the client." Is there more room to optimize
>> this query? Thanks again.
>>
>> Best,
>> Jia
>>
>> mos wrote:
>>> How many rows were added to rmpdata1 table? If it is 13.4 million
>>> rows then it is going to take several minutes to join this many rows
>>> from the 2 tables.
>>> Is there a 1:1 relationship between the two tables or a 1:Many? If
>>> there is a 1:1 then I'd recommend joining the two tables into 1
>>> table so you don't have to join them in the first place.
>>> The only other thing I can suggest is to change the type of index on
>>> the tables being joined to see if that makes a speed difference. For
>>> example, if you are using BTREE then switch to HASH or vice versa.
>>> See http://dev.mysql.com/doc/refman/5.0/en/create-index.html for
>>> more info.
>>>
>>> Mike
>>>
>>> At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>> Hi there,
>>>>
>>>> One simple query took more than 10 minutes. Here is how relevant
>>>> rows in the slow query log looks like:
>>>>
>>>> # Time: 090905 10:49:57
>>>> # User@Host: root[root] @ localhost []
>>>> # Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
>>>> use world;
>>>> create table rmpdata1 select ri.*,
>>>> mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>> coalesce(ri.ndate,mv.ndate) as cndate
>>>> from RItime as ri left outer join
>>>> MVtime as mv
>>>> on (ri.code=mv.code and
>>>> ri.ndate=mv.ndate);
>>>>
>>>> When I "explain" only the select clause, I get
>>>> ------------+----------+-------+
>>>> | id | select_type | table | type | possible_keys | key |
>>>> key_len | ref | rows | Extra |
>>>> +----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
>>>>
>>>> | 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>>>> | NULL | 13419851 | |
>>>> | 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>>>> | world.ri.code,world.ri.ndate | 1 | |
>>>> +----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
>>>>
>>>> 2 rows in set (0.00 sec)
>>>>
>>>> I use "show table status from world;" to get information about two
>>>> tables, RItime and MVtime, in the join clause:
>>>> Name: RItime
>>>> Engine: MyISAM
>>>> Version: 10
>>>> Row_format: Dynamic
>>>> Rows: 13419851
>>>> Avg_row_length: 31
>>>> Data_length: 427721848
>>>> Max_data_length: 281474976710655
>>>> Index_length: 347497472
>>>> Data_free: 0
>>>> Auto_increment: NULL
>>>> Create_time: 2009-09-03 10:17:57
>>>> Update_time: 2009-09-03 12:04:02
>>>> Check_time: NULL
>>>> Collation: latin1_swedish_ci
>>>> Checksum: NULL
>>>> Create_options:
>>>> Comment:
>>>> *************************** 2. row ***************************
>>>> Name: MVtime
>>>> Engine: MyISAM
>>>> Version: 10
>>>> Row_format: Dynamic
>>>> Rows: 13562373
>>>> Avg_row_length: 31
>>>> Data_length: 430220056
>>>> Max_data_length: 281474976710655
>>>> Index_length: 350996480
>>>> Data_free: 0
>>>> Auto_increment: NULL
>>>> Create_time: 2009-09-03 13:31:33
>>>> Update_time: 2009-09-03 13:43:51
>>>> Check_time: NULL
>>>> Collation: latin1_swedish_ci
>>>> Checksum: NULL
>>>> Create_options:
>>>> Comment:
>>>>
>>>> I also describe these two tables:
>>>> mysql> desc RItime;
>>>> +-------+------------+------+-----+------------+-------+
>>>> | Field | Type | Null | Key | Default | Extra |
>>>> +-------+------------+------+-----+------------+-------+
>>>> | code | varchar(6) | NO | PRI | | |
>>>> | ndate | date | NO | PRI | 0000-00-00 | |
>>>> | ri | double | YES | | NULL | |
>>>> | time | date | YES | | NULL | |
>>>> | bdate | date | YES | | NULL | |
>>>> +-------+------------+------+-----+------------+-------+
>>>> 5 rows in set (0.00 sec)
>>>>
>>>> mysql> desc MVtime;
>>>> +-------+------------+------+-----+------------+-------+
>>>> | Field | Type | Null | Key | Default | Extra |
>>>> +-------+------------+------+-----+------------+-------+
>>>> | code | varchar(6) | NO | PRI | | |
>>>> | ndate | date | NO | PRI | 0000-00-00 | |
>>>> | MV | double | YES | | NULL | |
>>>> | time | date | YES | | NULL | |
>>>> | bdate | date | YES | | NULL | |
>>>> +-------+------------+------+-----+------------+-------+
>>>> 5 rows in set (0.00 sec)
>>>>
>>>> Could you give me some hint on how to improve the speed of this query?
>>>> Thanks.
>>>>
>>>> Best,
>>>> Jia
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe:
>>>> http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
>>>
>
>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: How to optimize a slow query?
am 06.09.2009 23:11:04 von mos
Jia,
The code you sent seems to be able to get the job done.
You could try something simpler by executing 2 sql statements instead of
using one. Something like:
create table rmpdata1
select ri.*, mv.* from RItime as ri left join MVtime as mv on
(ri.code=mv.code and ri.ndate=mv.ndate));
insert into rmpdata1
select ri.*, mv.* from MVtime as mv left join RItime as ri on
(ri.code=mv.code and ri.ndate=mv.ndate)) where ri.code is null and ri.date
is null;
You will of course have to play with the column list to avoid duplicate
columns. The Insert statement will insert rows from mv that are missing in ri.
Mike
At 01:51 PM 9/6/2009, Jia Chen wrote:
>Hi Mike,
>
>Thanks for your detailed answer. Now, I understand what you mean. And,
>yes, I agree with you that keeping all data in one table works better for
>a bunch of 1:1 relationship tables. Actually, this is what I was trying
>to do with that query.
>
>Since you mention "They all had a 1:1 relationship and occasionally some
>of the tables did not have a corresponding row." and "From then on I've
>merged all 8 tables into one and if any of the subordinate table data
>isn't available for a row, its columns are set to NULL", I do want to ask
>you about how you set the columns to NULL for rows in subordinate table
>data unavailable because I have similar situation.
>
>If I want to combine two tables into one, I think that a full outer join
>can achieve what you did. However, MySQL does not have full outer join.
>So, I am using
>
>create table rmpdata1
>(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
> coalesce(ri.ndate,mv.ndate) as cndate
>from
> RItime as ri left outer join MVtime as mv
>on
> (ri.code=mv.code and ri.ndate=mv.ndate))
>union
>(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
> coalesce(ri.ndate,mv.ndate) as cndate
>from
> RItime as ri right outer join MVtime as mv
>on
> (ri.code=mv.code and ri.ndate=mv.ndate));
>
>This query takes more than twice as much time as the query in my first
>e-mail. Do you have a better way? Thanks.
>
>Best,
>Jia
>
>mos wrote:
>>Jia,
>>
>>>Yes, it is a 1:1 relationship between table RItime and MVtime. However,
>>>I don't get your suggestion, "I'd recommend joining the two tables into
>>>1 table so you don't have to join them in the first place." Could you
>>>elaborate that?
>>
>>Sure but first I have to relate it to my own experience. I had 8 tables
>>of around 25 million rows each. They all had a 1:1 relationship and
>>occasionally some of the tables did not have a corresponding row. I felt
>>it was better from a design point of view to have 8 different tables and
>>do the joins on the tables that I needed for each of my particular
>>queries. I'd be joining anywhere from 2 to 5 or 6 or even all 8 tables at
>>a time, using a where clause to select 15k rows at a time. This is the
>>way to do it from a normalized point of view. All of the information is
>>in its respective table and only assemble the tables for each particular query.
>>
>>Well, this was sloooowww! A heck of a lot of work was done to join the
>>tables together on a 2 column key (like yours). I also had to run
>>maintenance on the tables to see which tables where corrupted or were
>>missing rows that should have been there. The tables also repeated
>>columns from the other tables like date and product_id that is used to
>>help identify each row. Well to make a long story short, it was far too
>>much effort to juggle the relationships between all of these tables.
>>
>>Then a colleague made the monumental announcement by saying "I've never
>>found the need to use more than 1 table when there was a 1:1
>>relationship. There is a tremendous speed cost involved in piecing the
>>data back together. I put all of the data into 1 table". So the light
>>went on for me. From then on I've merged all 8 tables into one and if any
>>of the subordinate table data isn't available for a row, its columns are
>>set to NULL, which is the values they would have had anyway after a left join.
>>
>>I am perfectly happy know with one wide table with over 100 columns.
>>Everything is in its place and maintenance is a dream. Queries are also
>>quite fast because all of the information is under one table and not 8. I
>>don't have to worry about optimizing the indexes for the table joins
>>because there aren't any joins between these tables because it is all in 1 row.
>>
>>So you really have to ask yourself, why spend 10 minutes each time your
>>query is run? Instead you eliminate the query altogether by keeping the
>>data of the 2 tables into 1 table in the first place.
>>
>>Mike
>>
>>
>>
>>At 09:45 AM 9/6/2009, Jia Chen wrote:
>>>Thanks for your reply, Mike.
>>>
>>>Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to
>>>be too long. I run the same join by using SQL procedure in a statistical
>>>software called SAS on a similar machine. It only takes 1 minute and 3 seconds.
>>>
>>>Yes, it is a 1:1 relationship between table RItime and MVtime. However,
>>>I don't get your suggestion, "I'd recommend joining the two tables into
>>>1 table so you don't have to join them in the first place." Could you
>>>elaborate that?
>>>
>>>I was using B-tree index. Switching to HASH does help. Now, it takes 4
>>>min 50.17 sec to run the query.
>>>I also turn on profiling by using
>>>mysql> set profiling = 1;
>>>Query OK, 0 rows affected (0.01 sec)
>>>
>>>After the query finishes, I get
>>>mysql> show profile;
>>>+----------------------+------------+
>>>| Status | Duration |
>>>+----------------------+------------+
>>>| starting | 0.000123 |
>>>| checking permissions | 0.000010 |
>>>| Opening tables | 0.000044 |
>>>| System lock | 0.000007 |
>>>| Table lock | 0.000011 |
>>>| init | 0.000083 |
>>>| creating table | 0.003428 |
>>>| After create | 0.000124 |
>>>| System lock | 0.000004 |
>>>| Table lock | 0.000051 |
>>>| optimizing | 0.000007 |
>>>| statistics | 0.000033 |
>>>| preparing | 0.000020 |
>>>| executing | 0.000004 |
>>>| Sending data | 290.153530 |
>>>| end | 0.000008 |
>>>| end | 0.000004 |
>>>| query end | 0.000003 |
>>>| freeing items | 0.000010 |
>>>| closing tables | 0.000025 |
>>>| logging slow query | 0.000001 |
>>>| logging slow query | 0.013429 |
>>>| cleaning up | 0.000004 |
>>>+----------------------+------------+
>>>23 rows in set (0.02 sec)
>>>
>>>MySQL spends most of its time sending data. According to
>>>http://dev.mysql.com/doc/refman/5.0/en/general-thread-sta tes.html,
>>>sending data means that "the thread is processing rows for a |SELECT|
>>> statement and also
>>>is sending data to the client." Is there more room to optimize this
>>>query? Thanks again.
>>>
>>>Best,
>>>Jia
>>>
>>>mos wrote:
>>>>How many rows were added to rmpdata1 table? If it is 13.4 million rows
>>>>then it is going to take several minutes to join this many rows from
>>>>the 2 tables.
>>>>Is there a 1:1 relationship between the two tables or a 1:Many? If
>>>>there is a 1:1 then I'd recommend joining the two tables into 1 table
>>>>so you don't have to join them in the first place.
>>>>The only other thing I can suggest is to change the type of index on
>>>>the tables being joined to see if that makes a speed difference. For
>>>>example, if you are using BTREE then switch to HASH or vice versa. See
>>>>http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info.
>>>>
>>>>Mike
>>>>
>>>>At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>>>Hi there,
>>>>>
>>>>>One simple query took more than 10 minutes. Here is how relevant rows
>>>>>in the slow query log looks like:
>>>>>
>>>>># Time: 090905 10:49:57
>>>>># User@Host: root[root] @ localhost []
>>>>># Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
>>>>>use world;
>>>>>create table rmpdata1 select ri.*,
>>>>>mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>>>coalesce(ri.ndate,mv.ndate) as cndate
>>>>>from RItime as ri left outer join
>>>>>MVtime as mv
>>>>>on (ri.code=mv.code and
>>>>>ri.ndate=mv.ndate);
>>>>>
>>>>>When I "explain" only the select clause, I get
>>>>>------------+----------+-------+
>>>>>| id | select_type | table | type | possible_keys | key |
>>>>>key_len | ref | rows | Extra |
>>>>>+----+-------------+-------+--------+---------------+-- -------+---------+------------------------------+----------+ -------+
>>>>>
>>>>>| 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>>>>>| NULL | 13419851 | |
>>>>>| 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>>>>>| world.ri.code,world.ri.ndate | 1 | |
>>>>>+----+-------------+-------+--------+---------------+-- -------+---------+------------------------------+----------+ -------+
>>>>>
>>>>>2 rows in set (0.00 sec)
>>>>>
>>>>>I use "show table status from world;" to get information about two
>>>>>tables, RItime and MVtime, in the join clause:
>>>>> Name: RItime
>>>>> Engine: MyISAM
>>>>> Version: 10
>>>>> Row_format: Dynamic
>>>>> Rows: 13419851
>>>>>Avg_row_length: 31
>>>>> Data_length: 427721848
>>>>>Max_data_length: 281474976710655
>>>>> Index_length: 347497472
>>>>> Data_free: 0
>>>>>Auto_increment: NULL
>>>>> Create_time: 2009-09-03 10:17:57
>>>>> Update_time: 2009-09-03 12:04:02
>>>>> Check_time: NULL
>>>>> Collation: latin1_swedish_ci
>>>>> Checksum: NULL
>>>>>Create_options:
>>>>> Comment:
>>>>>*************************** 2. row ***************************
>>>>> Name: MVtime
>>>>> Engine: MyISAM
>>>>> Version: 10
>>>>> Row_format: Dynamic
>>>>> Rows: 13562373
>>>>>Avg_row_length: 31
>>>>> Data_length: 430220056
>>>>>Max_data_length: 281474976710655
>>>>> Index_length: 350996480
>>>>> Data_free: 0
>>>>>Auto_increment: NULL
>>>>> Create_time: 2009-09-03 13:31:33
>>>>> Update_time: 2009-09-03 13:43:51
>>>>> Check_time: NULL
>>>>> Collation: latin1_swedish_ci
>>>>> Checksum: NULL
>>>>>Create_options:
>>>>> Comment:
>>>>>
>>>>>I also describe these two tables:
>>>>>mysql> desc RItime;
>>>>>+-------+------------+------+-----+------------+------- +
>>>>>| Field | Type | Null | Key | Default | Extra |
>>>>>+-------+------------+------+-----+------------+------- +
>>>>>| code | varchar(6) | NO | PRI | | |
>>>>>| ndate | date | NO | PRI | 0000-00-00 | |
>>>>>| ri | double | YES | | NULL | |
>>>>>| time | date | YES | | NULL | |
>>>>>| bdate | date | YES | | NULL | |
>>>>>+-------+------------+------+-----+------------+------- +
>>>>>5 rows in set (0.00 sec)
>>>>>
>>>>>mysql> desc MVtime;
>>>>>+-------+------------+------+-----+------------+------- +
>>>>>| Field | Type | Null | Key | Default | Extra |
>>>>>+-------+------------+------+-----+------------+------- +
>>>>>| code | varchar(6) | NO | PRI | | |
>>>>>| ndate | date | NO | PRI | 0000-00-00 | |
>>>>>| MV | double | YES | | NULL | |
>>>>>| time | date | YES | | NULL | |
>>>>>| bdate | date | YES | | NULL | |
>>>>>+-------+------------+------+-----+------------+------- +
>>>>>5 rows in set (0.00 sec)
>>>>>
>>>>>Could you give me some hint on how to improve the speed of this query?
>>>>>Thanks.
>>>>>
>>>>>Best,
>>>>>Jia
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>MySQL General Mailing List
>>>>>For list archives: http://lists.mysql.com/mysql
>>>>>To unsubscribe:
>>>>>http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
>>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: How to optimize a slow query?
am 07.09.2009 15:42:50 von chen jia
Hi Mike,
Thanks for you help!
Best,
Jia
mos wrote:
> Jia,
> The code you sent seems to be able to get the job done.
>
> You could try something simpler by executing 2 sql statements instead
> of using one. Something like:
>
> create table rmpdata1
> select ri.*, mv.* from RItime as ri left join MVtime as mv on
> (ri.code=mv.code and ri.ndate=mv.ndate));
>
> insert into rmpdata1
> select ri.*, mv.* from MVtime as mv left join RItime as ri on
> (ri.code=mv.code and ri.ndate=mv.ndate)) where ri.code is null and
> ri.date is null;
>
> You will of course have to play with the column list to avoid
> duplicate columns. The Insert statement will insert rows from mv that
> are missing in ri.
>
> Mike
>
> At 01:51 PM 9/6/2009, Jia Chen wrote:
>> Hi Mike,
>>
>> Thanks for your detailed answer. Now, I understand what you mean.
>> And, yes, I agree with you that keeping all data in one table works
>> better for a bunch of 1:1 relationship tables. Actually, this is
>> what I was trying to do with that query.
>>
>> Since you mention "They all had a 1:1 relationship and occasionally
>> some of the tables did not have a corresponding row." and "From then
>> on I've merged all 8 tables into one and if any of the subordinate
>> table data isn't available for a row, its columns are set to NULL", I
>> do want to ask you about how you set the columns to NULL for rows in
>> subordinate table data unavailable because I have similar situation.
>>
>> If I want to combine two tables into one, I think that a full outer
>> join can achieve what you did. However, MySQL does not have full
>> outer join. So, I am using
>>
>> create table rmpdata1
>> (select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
>> coalesce(ri.ndate,mv.ndate) as cndate
>> from
>> RItime as ri left outer join MVtime as mv
>> on
>> (ri.code=mv.code and ri.ndate=mv.ndate))
>> union
>> (select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
>> coalesce(ri.ndate,mv.ndate) as cndate
>> from
>> RItime as ri right outer join MVtime as mv
>> on
>> (ri.code=mv.code and ri.ndate=mv.ndate));
>>
>> This query takes more than twice as much time as the query in my
>> first e-mail. Do you have a better way? Thanks.
>>
>> Best,
>> Jia
>>
>> mos wrote:
>>> Jia,
>>>
>>>> Yes, it is a 1:1 relationship between table RItime and MVtime.
>>>> However, I don't get your suggestion, "I'd recommend joining the
>>>> two tables into 1 table so you don't have to join them in the first
>>>> place." Could you elaborate that?
>>>
>>> Sure but first I have to relate it to my own experience. I had 8
>>> tables of around 25 million rows each. They all had a 1:1
>>> relationship and occasionally some of the tables did not have a
>>> corresponding row. I felt it was better from a design point of view
>>> to have 8 different tables and do the joins on the tables that I
>>> needed for each of my particular queries. I'd be joining anywhere
>>> from 2 to 5 or 6 or even all 8 tables at a time, using a where
>>> clause to select 15k rows at a time. This is the way to do it from a
>>> normalized point of view. All of the information is in its
>>> respective table and only assemble the tables for each particular
>>> query.
>>>
>>> Well, this was sloooowww! A heck of a lot of work was done to join
>>> the tables together on a 2 column key (like yours). I also had to
>>> run maintenance on the tables to see which tables where corrupted or
>>> were missing rows that should have been there. The tables also
>>> repeated columns from the other tables like date and product_id that
>>> is used to help identify each row. Well to make a long story short,
>>> it was far too much effort to juggle the relationships between all
>>> of these tables.
>>>
>>> Then a colleague made the monumental announcement by saying "I've
>>> never found the need to use more than 1 table when there was a 1:1
>>> relationship. There is a tremendous speed cost involved in piecing
>>> the data back together. I put all of the data into 1 table". So the
>>> light went on for me. From then on I've merged all 8 tables into one
>>> and if any of the subordinate table data isn't available for a row,
>>> its columns are set to NULL, which is the values they would have had
>>> anyway after a left join.
>>>
>>> I am perfectly happy know with one wide table with over 100 columns.
>>> Everything is in its place and maintenance is a dream. Queries are
>>> also quite fast because all of the information is under one table
>>> and not 8. I don't have to worry about optimizing the indexes for
>>> the table joins because there aren't any joins between these tables
>>> because it is all in 1 row.
>>>
>>> So you really have to ask yourself, why spend 10 minutes each time
>>> your query is run? Instead you eliminate the query altogether by
>>> keeping the data of the 2 tables into 1 table in the first place.
>>>
>>> Mike
>>>
>>>
>>>
>>> At 09:45 AM 9/6/2009, Jia Chen wrote:
>>>> Thanks for your reply, Mike.
>>>>
>>>> Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem
>>>> to be too long. I run the same join by using SQL procedure in a
>>>> statistical software called SAS on a similar machine. It only takes
>>>> 1 minute and 3 seconds.
>>>>
>>>> Yes, it is a 1:1 relationship between table RItime and MVtime.
>>>> However, I don't get your suggestion, "I'd recommend joining the
>>>> two tables into 1 table so you don't have to join them in the first
>>>> place." Could you elaborate that?
>>>>
>>>> I was using B-tree index. Switching to HASH does help. Now, it
>>>> takes 4 min 50.17 sec to run the query.
>>>> I also turn on profiling by using
>>>> mysql> set profiling = 1;
>>>> Query OK, 0 rows affected (0.01 sec)
>>>>
>>>> After the query finishes, I get
>>>> mysql> show profile;
>>>> +----------------------+------------+
>>>> | Status | Duration |
>>>> +----------------------+------------+
>>>> | starting | 0.000123 |
>>>> | checking permissions | 0.000010 |
>>>> | Opening tables | 0.000044 |
>>>> | System lock | 0.000007 |
>>>> | Table lock | 0.000011 |
>>>> | init | 0.000083 |
>>>> | creating table | 0.003428 |
>>>> | After create | 0.000124 |
>>>> | System lock | 0.000004 |
>>>> | Table lock | 0.000051 |
>>>> | optimizing | 0.000007 |
>>>> | statistics | 0.000033 |
>>>> | preparing | 0.000020 |
>>>> | executing | 0.000004 |
>>>> | Sending data | 290.153530 |
>>>> | end | 0.000008 |
>>>> | end | 0.000004 |
>>>> | query end | 0.000003 |
>>>> | freeing items | 0.000010 |
>>>> | closing tables | 0.000025 |
>>>> | logging slow query | 0.000001 |
>>>> | logging slow query | 0.013429 |
>>>> | cleaning up | 0.000004 |
>>>> +----------------------+------------+
>>>> 23 rows in set (0.02 sec)
>>>>
>>>> MySQL spends most of its time sending data. According to
>>>> http://dev.mysql.com/doc/refman/5.0/en/general-thread-states .html,
>>>> sending data means that "the thread is processing rows for a
>>>> |SELECT|
>>>> statement and also is sending data to the client." Is there more
>>>> room to optimize this query? Thanks again.
>>>>
>>>> Best,
>>>> Jia
>>>>
>>>> mos wrote:
>>>>> How many rows were added to rmpdata1 table? If it is 13.4 million
>>>>> rows then it is going to take several minutes to join this many
>>>>> rows from the 2 tables.
>>>>> Is there a 1:1 relationship between the two tables or a 1:Many? If
>>>>> there is a 1:1 then I'd recommend joining the two tables into 1
>>>>> table so you don't have to join them in the first place.
>>>>> The only other thing I can suggest is to change the type of index
>>>>> on the tables being joined to see if that makes a speed
>>>>> difference. For example, if you are using BTREE then switch to
>>>>> HASH or vice versa. See
>>>>> http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more
>>>>> info.
>>>>>
>>>>> Mike
>>>>>
>>>>> At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> One simple query took more than 10 minutes. Here is how relevant
>>>>>> rows in the slow query log looks like:
>>>>>>
>>>>>> # Time: 090905 10:49:57
>>>>>> # User@Host: root[root] @ localhost []
>>>>>> # Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined:
>>>>>> 26758561
>>>>>> use world;
>>>>>> create table rmpdata1 select ri.*,
>>>>>> mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>>>> coalesce(ri.ndate,mv.ndate) as cndate
>>>>>> from RItime as ri left outer join
>>>>>> MVtime as mv
>>>>>> on (ri.code=mv.code and
>>>>>> ri.ndate=mv.ndate);
>>>>>>
>>>>>> When I "explain" only the select clause, I get
>>>>>> ------------+----------+-------+
>>>>>> | id | select_type | table | type | possible_keys | key |
>>>>>> key_len | ref | rows | Extra |
>>>>>> +----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
>>>>>>
>>>>>> | 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>>>>>> | NULL | 13419851 | |
>>>>>> | 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>>>>>> | world.ri.code,world.ri.ndate | 1 | |
>>>>>> +----+-------------+-------+--------+---------------+------- --+---------+------------------------------+----------+----- --+
>>>>>>
>>>>>> 2 rows in set (0.00 sec)
>>>>>>
>>>>>> I use "show table status from world;" to get information about
>>>>>> two tables, RItime and MVtime, in the join clause:
>>>>>> Name: RItime
>>>>>> Engine: MyISAM
>>>>>> Version: 10
>>>>>> Row_format: Dynamic
>>>>>> Rows: 13419851
>>>>>> Avg_row_length: 31
>>>>>> Data_length: 427721848
>>>>>> Max_data_length: 281474976710655
>>>>>> Index_length: 347497472
>>>>>> Data_free: 0
>>>>>> Auto_increment: NULL
>>>>>> Create_time: 2009-09-03 10:17:57
>>>>>> Update_time: 2009-09-03 12:04:02
>>>>>> Check_time: NULL
>>>>>> Collation: latin1_swedish_ci
>>>>>> Checksum: NULL
>>>>>> Create_options:
>>>>>> Comment:
>>>>>> *************************** 2. row ***************************
>>>>>> Name: MVtime
>>>>>> Engine: MyISAM
>>>>>> Version: 10
>>>>>> Row_format: Dynamic
>>>>>> Rows: 13562373
>>>>>> Avg_row_length: 31
>>>>>> Data_length: 430220056
>>>>>> Max_data_length: 281474976710655
>>>>>> Index_length: 350996480
>>>>>> Data_free: 0
>>>>>> Auto_increment: NULL
>>>>>> Create_time: 2009-09-03 13:31:33
>>>>>> Update_time: 2009-09-03 13:43:51
>>>>>> Check_time: NULL
>>>>>> Collation: latin1_swedish_ci
>>>>>> Checksum: NULL
>>>>>> Create_options:
>>>>>> Comment:
>>>>>>
>>>>>> I also describe these two tables:
>>>>>> mysql> desc RItime;
>>>>>> +-------+------------+------+-----+------------+-------+
>>>>>> | Field | Type | Null | Key | Default | Extra |
>>>>>> +-------+------------+------+-----+------------+-------+
>>>>>> | code | varchar(6) | NO | PRI | | |
>>>>>> | ndate | date | NO | PRI | 0000-00-00 | |
>>>>>> | ri | double | YES | | NULL | |
>>>>>> | time | date | YES | | NULL | |
>>>>>> | bdate | date | YES | | NULL | |
>>>>>> +-------+------------+------+-----+------------+-------+
>>>>>> 5 rows in set (0.00 sec)
>>>>>>
>>>>>> mysql> desc MVtime;
>>>>>> +-------+------------+------+-----+------------+-------+
>>>>>> | Field | Type | Null | Key | Default | Extra |
>>>>>> +-------+------------+------+-----+------------+-------+
>>>>>> | code | varchar(6) | NO | PRI | | |
>>>>>> | ndate | date | NO | PRI | 0000-00-00 | |
>>>>>> | MV | double | YES | | NULL | |
>>>>>> | time | date | YES | | NULL | |
>>>>>> | bdate | date | YES | | NULL | |
>>>>>> +-------+------------+------+-----+------------+-------+
>>>>>> 5 rows in set (0.00 sec)
>>>>>>
>>>>>> Could you give me some hint on how to improve the speed of this
>>>>>> query?
>>>>>> Thanks.
>>>>>>
>>>>>> Best,
>>>>>> Jia
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> MySQL General Mailing List
>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>> To unsubscribe:
>>>>>> http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
>>>
>
>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org