Speeding up a pretty simple correlated update query

Speeding up a pretty simple correlated update query

am 03.09.2009 01:35:07 von Hank

--001485f793669f06c00472a0b75d
Content-Type: text/plain; charset=ISO-8859-1

Hello All,
I have a legacy application which was written using a compound primary key
of an item number (non unique) along with a category ID. The combination of
the item number and category ID make the records unique.

I am in the process of replacing the compound (VARCHAR) keys with an
unique integer key in these tables.

So I have created an item_seq table and assigned a unique sequence number to
each compound key -- it looks like this (all tables are myisam tables, and
mysql version 5.0)

desc item_seq;
+-----------+------------------+------+-----+---------+----- -----------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----- -----------+
| seq | int(10) unsigned | NO | PRI | NULL | auto_increment |
| itemid | char(11) | NO | MUL | | |
| category | char(4) | NO | | | |
+-----------+------------------+------+-----+---------+----- -----------+

I also have my main transactional table with about 180,000,000 rows -- it
looks like this:

desc item_trans;
+-------------+-----------------------+------+-----+-------- -------------+-------+
| Field | Type | Null | Key | Default |
Extra |
+-------------+-----------------------+------+-----+-------- -------------+-------+
| seq | int(10) unsigned | NO | MUL | |
|
| itemid | char(11) | NO | PRI | |
|
| category | char(4) | NO | PRI | |
|
| transid | int(10) | NO | PRI | |
|

Currently the "seq" field is null for the entire table. So of course, I
want to update the main transaction table with the new sequence number.

So I've disabled all the keys on the "item_trans" table -- since I am
updating every row, it wouldn't (shouldn't) be using the index anyway.

Here is my correlated update query:

update item_trans i, item_seq is
set i.seq=is.seq
where is.itemid=i.itemid and is.category=i.category;

If I run an explain on the select version of the update, this is what I
get:

+----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
| 1 | SIMPLE | item_trans | ALL | PRIMARY | NULL | NULL
| NULL | 178948797 | |
| 1 | SIMPLE | item_seq | eq_ref | itemid | itemid | 20 |
g.item_trans.itemid,g.item_trans.category | 1 |
|
+----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+

.... which is exactly what I would expect it to do. Update every record of
the item_trans table, and do a full index lookup on the items_seq table.

SO... I've been running this query to update item_trans, and it's been
running for 5 days now.

I've also tried running this with the primary key index on the item_trans
table (but not the seq index), and that ran slower in my initial tests.

Are there any faster ways to update 180 million records with a correlated
update query? And I'm fairly certain that trying to do this in PHP
one-record at a time would take much longer than a SQL solution.

Thanks,

-Hank

--001485f793669f06c00472a0b75d--

RE: Speeding up a pretty simple correlated update query

am 03.09.2009 01:50:36 von Gavin Towey

Do you know that if you create seq column on the original table as an auto_=
increment primary key, it will fill in the numbers automatically? There's =
no need to create the values on another table and update with a join.

Regards,
Gavin Towey

-----Original Message-----
From: Hank [mailto:heskin@gmail.com]
Sent: Wednesday, September 02, 2009 4:35 PM
To: mysql@lists.mysql.com
Subject: Speeding up a pretty simple correlated update query

Hello All,
I have a legacy application which was written using a compound primary ke=
y
of an item number (non unique) along with a category ID. The combination of
the item number and category ID make the records unique.

I am in the process of replacing the compound (VARCHAR) keys with an
unique integer key in these tables.

So I have created an item_seq table and assigned a unique sequence number t=
o
each compound key -- it looks like this (all tables are myisam tables, and
mysql version 5.0)

desc item_seq;
+-----------+------------------+------+-----+---------+----- -----------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----- -----------+
| seq | int(10) unsigned | NO | PRI | NULL | auto_increment |
| itemid | char(11) | NO | MUL | | |
| category | char(4) | NO | | | |
+-----------+------------------+------+-----+---------+----- -----------+

I also have my main transactional table with about 180,000,000 rows -- it
looks like this:

desc item_trans;
+-------------+-----------------------+------+-----+-------- -------------+-=
------+
| Field | Type | Null | Key | Default |
Extra |
+-------------+-----------------------+------+-----+-------- -------------+-=
------+
| seq | int(10) unsigned | NO | MUL | |
|
| itemid | char(11) | NO | PRI | |
|
| category | char(4) | NO | PRI | |
|
| transid | int(10) | NO | PRI | |
|

Currently the "seq" field is null for the entire table. So of course, I
want to update the main transaction table with the new sequence number.

So I've disabled all the keys on the "item_trans" table -- since I am
updating every row, it wouldn't (shouldn't) be using the index anyway.

Here is my correlated update query:

update item_trans i, item_seq is
set i.seq=3Dis.seq
where is.itemid=3Di.itemid and is.category=3Di.category;

If I run an explain on the select version of the update, this is what I
get:

+----+-------------+----------+--------+---------------+---- ----+---------+=
------------------------------------------------+----------- +-------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------+--------+---------------+---- ----+---------+=
------------------------------------------------+----------- +-------+
| 1 | SIMPLE | item_trans | ALL | PRIMARY | NULL | NULL
| NULL | 178948797 | |
| 1 | SIMPLE | item_seq | eq_ref | itemid | itemid | 20 |
g.item_trans.itemid,g.item_trans.category | 1 |
|
+----+-------------+----------+--------+---------------+---- ----+---------+=
------------------------------------------------+----------- +-------+

.... which is exactly what I would expect it to do. Update every record of
the item_trans table, and do a full index lookup on the items_seq table.

SO... I've been running this query to update item_trans, and it's been
running for 5 days now.

I've also tried running this with the primary key index on the item_trans
table (but not the seq index), and that ran slower in my initial tests.

Are there any faster ways to update 180 million records with a correlated
update query? And I'm fairly certain that trying to do this in PHP
one-record at a time would take much longer than a SQL solution.

Thanks,

-Hank

The information contained in this transmission may contain privileged and c=
onfidential information. It is intended only for the use of the person(s) n=
amed above. If you are not the intended recipient, you are hereby notified =
that any review, dissemination, distribution or duplication of this communi=
cation is strictly prohibited. If you are not the intended recipient, pleas=
e contact the sender by reply email and destroy all copies of the original =
message.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Speeding up a pretty simple correlated update query

am 03.09.2009 02:00:41 von Hank

--001485f1d65a109be80472a113aa
Content-Type: text/plain; charset=ISO-8859-1

Hello Gavin,
That's what I did with the first one-to-one table to create the unique SEQ
field mapping to each item/category combination. The problem is on the
TRANSACTION table, where there are multiple instances of each item/category.
If I just put a auto_increment primary key on that table, I'd get a unique
TRANSACTION ID, which is not what I want. I want to populate the
transaction table with the new integer seq key created in the first table.

I guess I should have stated that my overall objective here is to eventually
drop the VARCHAR itemid and category id fields from the transaction table,
leaving only the new item sequence id (plus transid) as the primary key.
There are many tables throughout the schema that do this, and I would be
replacing them all. It's just that this is the largest table, and the
correlated update is taking a long time, and I'm looking for a better
solution (if one exists). thanks.

-Hank


On Wed, Sep 2, 2009 at 7:50 PM, Gavin Towey wrote:

> Do you know that if you create seq column on the original table as an
> auto_increment primary key, it will fill in the numbers automatically?
> There's no need to create the values on another table and update with a
> join.
>
> Regards,
> Gavin Towey
>
> -----Original Message-----
> From: Hank [mailto:heskin@gmail.com]
> Sent: Wednesday, September 02, 2009 4:35 PM
> To: mysql@lists.mysql.com
> Subject: Speeding up a pretty simple correlated update query
>
> Hello All,
> I have a legacy application which was written using a compound primary key
> of an item number (non unique) along with a category ID. The combination of
> the item number and category ID make the records unique.
>
> I am in the process of replacing the compound (VARCHAR) keys with an
> unique integer key in these tables.
>
> So I have created an item_seq table and assigned a unique sequence number
> to
> each compound key -- it looks like this (all tables are myisam tables, and
> mysql version 5.0)
>
> desc item_seq;
> +-----------+------------------+------+-----+---------+----- -----------+
> | Field | Type | Null | Key | Default | Extra |
> +-----------+------------------+------+-----+---------+----- -----------+
> | seq | int(10) unsigned | NO | PRI | NULL | auto_increment |
> | itemid | char(11) | NO | MUL | | |
> | category | char(4) | NO | | | |
> +-----------+------------------+------+-----+---------+----- -----------+
>
> I also have my main transactional table with about 180,000,000 rows -- it
> looks like this:
>
> desc item_trans;
>
> +-------------+-----------------------+------+-----+-------- -------------+-------+
> | Field | Type | Null | Key | Default |
> Extra |
>
> +-------------+-----------------------+------+-----+-------- -------------+-------+
> | seq | int(10) unsigned | NO | MUL | |
> |
> | itemid | char(11) | NO | PRI | |
> |
> | category | char(4) | NO | PRI | |
> |
> | transid | int(10) | NO | PRI | |
> |
>
> Currently the "seq" field is null for the entire table. So of course, I
> want to update the main transaction table with the new sequence number.
>
> So I've disabled all the keys on the "item_trans" table -- since I am
> updating every row, it wouldn't (shouldn't) be using the index anyway.
>
> Here is my correlated update query:
>
> update item_trans i, item_seq is
> set i.seq=is.seq
> where is.itemid=i.itemid and is.category=i.category;
>
> If I run an explain on the select version of the update, this is what I
> get:
>
>
> +----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
> | id | select_type | table | type | possible_keys | key | key_len |
> ref | rows | Extra |
>
> +----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
> | 1 | SIMPLE | item_trans | ALL | PRIMARY | NULL | NULL
> | NULL | 178948797 | |
> | 1 | SIMPLE | item_seq | eq_ref | itemid | itemid | 20 |
> g.item_trans.itemid,g.item_trans.category | 1 |
> |
>
> +----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
>
> ... which is exactly what I would expect it to do. Update every record of
> the item_trans table, and do a full index lookup on the items_seq table.
>
> SO... I've been running this query to update item_trans, and it's been
> running for 5 days now.
>
> I've also tried running this with the primary key index on the item_trans
> table (but not the seq index), and that ran slower in my initial tests.
>
> Are there any faster ways to update 180 million records with a correlated
> update query? And I'm fairly certain that trying to do this in PHP
> one-record at a time would take much longer than a SQL solution.
>
> Thanks,
>
> -Hank
>
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
>

--001485f1d65a109be80472a113aa--

Re: Speeding up a pretty simple correlated update query

am 06.09.2009 19:32:58 von Hank

--001485f6ce2ad938ab0472ec1f6e
Content-Type: text/plain; charset=ISO-8859-1

Hello All,
I'm reposting this since I didn't get much response the last time, so I'm
hoping to reach out again. My correlated update query (see below) was
running for 9 days before I killed it. Here is my original question:

I have a legacy application which was written using a compound primary key
of an item number (non unique) along with a category ID. The combination of
the item number and category ID make the records unique.

I am in the process of replacing the compound (VARCHAR) keys with an
unique integer key in these tables.

So I have created an item_seq table and assigned a unique sequence number to
each compound key -- it looks like this (all tables are myisam tables, and
mysql version 5.0)

desc item_seq;
+-----------+------------------+------+-----+---------+----- -----------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----- -----------+
| seq | int(10) unsigned | NO | PRI | NULL | auto_increment |
| itemid | char(11) | NO | MUL | | |
| category | char(4) | NO | | | |
+-----------+------------------+------+-----+---------+----- -----------+

I also have my main transactional table with about 180,000,000 rows -- it
looks like this:

desc item_trans;
+-------------+-----------------------+------+-----+-------- -------------+-------+
| Field | Type | Null | Key | Default |
Extra |
+-------------+-----------------------+------+-----+-------- -------------+-------+
| seq | int(10) unsigned | NO | MUL | |
|
| itemid | char(11) | NO | PRI | |
|
| category | char(4) | NO | PRI | |
|
| transid | int(10) | NO | PRI | |
|

Currently the "seq" field is null for the entire table. So of course, I
want to update the main transaction table with the new sequence number.

So I've disabled all the keys on the "item_trans" table -- since I am
updating every row, it wouldn't (shouldn't) be using the index anyway.

Here is my correlated update query:

update item_trans i, item_seq is
set i.seq=is.seq
where is.itemid=i.itemid and is.category=i.category;

If I run an explain on the select version of the update, this is what I
get:

+----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+
| 1 | SIMPLE | item_trans | ALL | PRIMARY | NULL | NULL
| NULL | 178948797 | |
| 1 | SIMPLE | item_seq | eq_ref | itemid | itemid | 20 |
g.item_trans.itemid,g.item_trans.category | 1 |
|
+----+-------------+----------+--------+---------------+---- ----+---------+--------------------------------------------- ---+-----------+-------+

.... which is exactly what I would expect it to do. Update every record of
the item_trans table, and do a full index lookup on the items_seq table.

SO... I've been running this query to update item_trans, and it's been
running for 5 days now.

I've also tried running this with the primary key index on the item_trans
table (but not the seq index), and that ran slower in my initial tests.

Are there any faster ways to update 180 million records with a correlated
update query? And I'm fairly certain that trying to do this in PHP
one-record at a time would take much longer than a SQL solution.

Thanks,

-Hank

--001485f6ce2ad938ab0472ec1f6e--

Re: Speeding up a pretty simple correlated update query

am 07.09.2009 00:01:01 von mos

>So I've disabled all the keys on the "item_trans" table -- since I am
>updating every row, it wouldn't (shouldn't) be using the index anyway.

You can't disable unique indexes or primary keys. They are always active.
You can only deactivate non-unique indexes.

Here are a couple of suggestions.

For now drop the index on item_seq.seq and desc.seq.
Are you sure you have a compound index on item_seq.itemid and
item_seq.category??

do a "Show create table item_seq" and also "Show create table desc" to see
what you've got.

Make sure your my.cnf file has
key_buffer_size=500M

equal to about 30% of your available memory. You can always reduce it later.

Of course there is another way of doing it, if you are willing to have the
tail wag the dog. You may kick yourself for not discovering it yourself. :)

set @num:=0;
set @last:='';
create table new_item_trans select IF(concat(itemid,category),@last,
@num:=@num+1,@num) Seq, itemid, category, transid, ...
,@last:=concat(itemid,category) as TMPLast from item_trans order by
concat(itemid,category);

Now you can use the Alter statement to add your indexes and get rid of the
TMPLast column.

To build the item_seq table you would now use:

create table item_seq select seq, itemid, category from new_item_trans
group by seq, itemid, category;
And of course build your indexes on seq and rename the new_item_trans.

I guarantee you this last solution will not take 9 days to complete! :-)

Mike

At 12:32 PM 9/6/2009, Hank wrote:
>Hello All,
> I'm reposting this since I didn't get much response the last time, so I'm
>hoping to reach out again. My correlated update query (see below) was
>running for 9 days before I killed it. Here is my original question:
>
> I have a legacy application which was written using a compound primary key
>of an item number (non unique) along with a category ID. The combination of
>the item number and category ID make the records unique.
>
> I am in the process of replacing the compound (VARCHAR) keys with an
>unique integer key in these tables.
>
>So I have created an item_seq table and assigned a unique sequence number to
>each compound key -- it looks like this (all tables are myisam tables, and
>mysql version 5.0)
>
>desc item_seq;
>+-----------+------------------+------+-----+---------+---- ------------+
>| Field | Type | Null | Key | Default | Extra |
>+-----------+------------------+------+-----+---------+---- ------------+
>| seq | int(10) unsigned | NO | PRI | NULL | auto_increment |
>| itemid | char(11) | NO | MUL | | |
>| category | char(4) | NO | | | |
>+-----------+------------------+------+-----+---------+---- ------------+
>
>I also have my main transactional table with about 180,000,000 rows -- it
>looks like this:
>
>desc item_trans;
>+-------------+-----------------------+------+-----+------- --------------+-------+
>| Field | Type | Null | Key | Default |
>Extra |
>+-------------+-----------------------+------+-----+------- --------------+-------+
>| seq | int(10) unsigned | NO | MUL | |
> |
>| itemid | char(11) | NO | PRI | |
> |
>| category | char(4) | NO | PRI | |
> |
>| transid | int(10) | NO | PRI | |
> |
>
>Currently the "seq" field is null for the entire table. So of course, I
>want to update the main transaction table with the new sequence number.
>
>So I've disabled all the keys on the "item_trans" table -- since I am
>updating every row, it wouldn't (shouldn't) be using the index anyway.
>
>Here is my correlated update query:
>
> update item_trans i, item_seq is
> set i.seq=is.seq
> where is.itemid=i.itemid and is.category=i.category;
>
> If I run an explain on the select version of the update, this is what I
>get:
>
>+----+-------------+----------+--------+---------------+--- -----+---------+-------------------------------------------- ----+-----------+-------+
>| id | select_type | table | type | possible_keys | key | key_len |
>ref | rows | Extra |
>+----+-------------+----------+--------+---------------+--- -----+---------+-------------------------------------------- ----+-----------+-------+
>| 1 | SIMPLE | item_trans | ALL | PRIMARY | NULL | NULL
> | NULL | 178948797 | |
>| 1 | SIMPLE | item_seq | eq_ref | itemid | itemid | 20 |
>g.item_trans.itemid,g.item_trans.category | 1 |
>|
>+----+-------------+----------+--------+---------------+--- -----+---------+-------------------------------------------- ----+-----------+-------+
>
>... which is exactly what I would expect it to do. Update every record of
>the item_trans table, and do a full index lookup on the items_seq table.
>
>SO... I've been running this query to update item_trans, and it's been
>running for 5 days now.
>
>I've also tried running this with the primary key index on the item_trans
>table (but not the seq index), and that ran slower in my initial tests.
>
>Are there any faster ways to update 180 million records with a correlated
>update query? And I'm fairly certain that trying to do this in PHP
>one-record at a time would take much longer than a SQL solution.
>
>Thanks,
>
>-Hank


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Speeding up a pretty simple correlated update query

am 07.09.2009 06:55:32 von Hank

--001485f6ce2ad9a60a0472f5a88d
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Sep 6, 2009 at 6:01 PM, mos wrote:

>
> So I've disabled all the keys on the "item_trans" table -- since I am
>> updating every row, it wouldn't (shouldn't) be using the index anyway.
>>
>
> You can't disable unique indexes or primary keys. They are always active.
> You can only deactivate non-unique indexes.
>
> Here are a couple of suggestions.
>
> For now drop the index on item_seq.seq and desc.seq.
> Are you sure you have a compound index on item_seq.itemid and
> item_seq.category??
>
> do a "Show create table item_seq" and also "Show create table desc" to see
> what you've got.
>
> Make sure your my.cnf file has
> key_buffer_size=500M
>
> equal to about 30% of your available memory. You can always reduce it
> later.
>
> Of course there is another way of doing it, if you are willing to have the
> tail wag the dog. You may kick yourself for not discovering it yourself. :)
>
> set @num:=0;
> set @last:='';
> create table new_item_trans select IF(concat(itemid,category),@last,
> @num:=@num+1,@num) Seq, itemid, category, transid, ...
> ,@last:=concat(itemid,category) as TMPLast from item_trans order by
> concat(itemid,category);
>
> Now you can use the Alter statement to add your indexes and get rid of the
> TMPLast column.
>
> To build the item_seq table you would now use:
>
> create table item_seq select seq, itemid, category from new_item_trans
> group by seq, itemid, category;
> And of course build your indexes on seq and rename the new_item_trans.
>
> I guarantee you this last solution will not take 9 days to complete! :-)
>
> Mike
>
>
Hi Mike,

Thanks for your reply. First, in my tests, I've created the target table
(item_trans) as a copy of the source table with no indexes at all (even no
primary key). Once I get the item_seq field populated, I'll go back and
re-create the indexes in batch using myisamchk (I've posted about this
recently).

Second, I like your second creative solution (I never would have come up
with that), but in order for it to work, mysql would have to sort 180
million records before creating the table or retrieve them out of the table
via the contactenated index, both of which I think will take a long time...
but I'll certainly give it a shot tomorrow and let you know how it goes.
Thanks again.

-Hank

--001485f6ce2ad9a60a0472f5a88d--