Fast Index Creation and fill factor

am 31.08.2010 05:30:19 von Kyong Kim

I've been going through the 5.1 manual and exploring the new features.

To add a secondary index to an existing table, InnoDB scans the table,
and sorts the rows using memory buffers and temporary files in order
by the value(s) of the secondary index key column(s). The B-tree is
then built in key-value order, which is more efficient than inserting
rows into an index in random order with respect to the key values.
Because the B-tree nodes are split when they fill, building the index
in this way results in a higher fill-factor for the index, making it
more efficient for subsequent access.

I've been running some tests on SELECT using indexes built via plugin
and builtin and the results are similar (though I think the test cases
are flawed due to low cardinality of the column being indexed and
queried).
My question is...wouldn't the eventual fill factors be similar whether
the pages are split during unordered B-tree build or ordered build?
I'm having a tough time visualizing this scenario. Any insight into
this will be much appreciated.

Kyong

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Fast Index Creation and fill factor

am 31.08.2010 18:36:56 von Dan Nelson

In the last episode (Aug 30), Kyong Kim said:
> I've been going through the 5.1 manual and exploring the new features.
>
> To add a secondary index to an existing table, InnoDB scans the table, and
> sorts the rows using memory buffers and temporary files in order by the
> value(s) of the secondary index key column(s). The B-tree is then built
> in key-value order, which is more efficient than inserting rows into an
> index in random order with respect to the key values. Because the B-tree
> nodes are split when they fill, building the index in this way results in
> a higher fill-factor for the index, making it more efficient for
> subsequent access.
>
> I've been running some tests on SELECT using indexes built via plugin and
> builtin and the results are similar (though I think the test cases are
> flawed due to low cardinality of the column being indexed and queried).
>
> My question is...wouldn't the eventual fill factors be similar whether the
> pages are split during unordered B-tree build or ordered build? I'm
> having a tough time visualizing this scenario. Any insight into this will
> be much appreciated.

The big difference is how the leaf nodes get filled. With sequential
insertion, the record that "overflows" a leaf block would go into a new
block anyway so there's no need to split the original one. With random
insertion, the new record is probably not going to be at the beginning or
the end of the block, so the block gets split in half and each new block
starts out 50% full.

http://dev.mysql.com/doc/refman/5.1/en/innodb-physical-struc ture.html

--
Dan Nelson
dnelson@allantgroup.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org