Multi-Column Indexes, field order, and performance.

Multi-Column Indexes, field order, and performance.

am 07.09.2006 19:49:27 von Mark Smith

------_=_NextPart_001_01C6D2A5.F6746FD2
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hello I have a question on using a Index that spans multiple fields
INDEX(a,b,c).

=20

If my queries, that use this index, all have where conditions that
include all three fields; does it matter about the index order for query
performance?

=20

In other words is INDEX(a,b,c) =3D INDEX(c,b,a) or any combination there
in? Is it true for all index types or Databases for that matter (MySQL,
Oracle, MSSQL)?

=20

I am in a quandary because the last component in this case is time
(field c). I have been asked to aggregate across the time dimension for
fields a and b. Because now a and b are no longer conditions in the
where statement (now included in the group by) using the existing index
is no longer possible or feasible. Because of legacy SQL code I am
worried about simply changing the order at this time. Adding a second
index is frowned upon due to the huge nature of the table and the space
requirements.

=20

Thanks.


------_=_NextPart_001_01C6D2A5.F6746FD2--

Re: Multi-Column Indexes, field order, and performance.

am 13.09.2006 03:42:25 von Michael Louie Loria

--------------enig555F0439DC21F5FD0D563EB3
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

> Subject: Multi-Column Indexes, field order, and performance.
> From: "Mark Smith"
> Date: Thu, 7 Sep 2006 10:49:27 -0700
> To:
>=20
>=20
> Hello I have a question on using a Index that spans multiple fields
> INDEX(a,b,c).
>=20
> =20
>=20
> If my queries, that use this index, all have where conditions that
> include all three fields; does it matter about the index order for quer=
y
> performance?
>=20
> =20
>=20
> In other words is INDEX(a,b,c) =3D INDEX(c,b,a) or any combination ther=
e
> in? Is it true for all index types or Databases for that matter (MySQL=
,
> Oracle, MSSQL)?
>=20
> =20
>=20
> I am in a quandary because the last component in this case is time
> (field c). I have been asked to aggregate across the time dimension fo=
r
> fields a and b. Because now a and b are no longer conditions in the
> where statement (now included in the group by) using the existing index=

> is no longer possible or feasible. Because of legacy SQL code I am
> worried about simply changing the order at this time. Adding a second
> index is frowned upon due to the huge nature of the table and the space=

> requirements.

The index order matters.

Example: INDEX(A,B,C)

You can use that index to search on (A) or (on A and B) or (on A,B and
C), but you cannot use it to search on C. Given this flexibility,
organize the columns in the index in an order that will support the
widest range of queries. Remember that you can rarely afford to support
all possible indexes because of the overhead indexes add to DML
operations so make sure you pick the most effective set of indexes.



--------------enig555F0439DC21F5FD0D563EB3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPG Public Key: https://www.biglumber.com/x/web?qs=0x4A256EC8
Comment: GPG Public Key: http://www.lorztech.com/GPG.txt
Comment: Google, Skype, Yahoo ID: michaellouieloria

iQEVAwUBRQdiAbXBHi2y3jwfAQpXiAf/f7VRypjwWyGdPfQxxXNDWseGFhyY MXsF
NHRFA3PJ5EMH1CHl9zwqxCH9ztXHedAB0xECwbemXp4RhwBGKTR9p6RzPVGI ehgu
KyA4sa+QgZP8FC1oGDH8aNwwW9pkmUbrL8nvZeCCIaCI27WJqocXoo8hVxnF DUaq
NG8YbAbZIzc9p+fRi1iOd8MAC9dmc0LDTbm9/J9d2MKjUlK4dIlh2ZUfNdAR 5taf
MX8RA7cBgA/CE12YU9TkLU1Modmr/KgKO1Q6N5Swtd+ziqeoaeVs0TXffmrx 9Y92
s9UkLdUJ2TtHaARrlFU9Ou9NWLSsy8yrTA4/Dw53a461sILfnwTLpQ==
=g/x7
-----END PGP SIGNATURE-----

--------------enig555F0439DC21F5FD0D563EB3--

RE: Multi-Column Indexes, field order, and performance.

am 13.09.2006 05:44:49 von Mark Smith

Thank you Michael,

I am a where of the limitations that you outline. My question though if
all of your queries contain WHERE conditions (with a, b, and c) then
does it matter if the index is (on c, b, a)?

Example

Select # different types of SQL
FROM silly_table
WHERE
a =3D 1 and # could be any value
b =3D 2 and # could be any value
c >=3D '2006-09-01 00:00:00' and c < '2006-09-01 00:00:00';
# could be any time range

The silly table might look like:
CREATE TABLE silly_table (
a float NOT NULL default 0,
b float NOT NULL default 0,
c DateTime NOT NULL default '0000-00-00 00:00:00',
# Now here is the question does it matter which one I create?
# KEY my_key(a,b,c),
# KEY my_key(c,b,a),
# KEY my_key (any combination there of?),
) Engine=3DMyISAM;

Thanks,

Mark

-----Original Message-----
From: Michael Louie Loria [mailto:mlloria@lorztech.com]=20
Sent: Tuesday, September 12, 2006 6:42 PM
To: win32@lists.mysql.com
Subject: Re: Multi-Column Indexes, field order, and performance.

> Subject: Multi-Column Indexes, field order, and performance.
> From: "Mark Smith"
> Date: Thu, 7 Sep 2006 10:49:27 -0700
> To:
>=20
>=20
> Hello I have a question on using a Index that spans multiple fields
> INDEX(a,b,c).
>=20
> =20
>=20
> If my queries, that use this index, all have where conditions that
> include all three fields; does it matter about the index order for
query
> performance?
>=20
> =20
>=20
> In other words is INDEX(a,b,c) =3D INDEX(c,b,a) or any combination =
there
> in? Is it true for all index types or Databases for that matter
(MySQL,
> Oracle, MSSQL)?
>=20
> =20
>=20
> I am in a quandary because the last component in this case is time
> (field c). I have been asked to aggregate across the time dimension
for
> fields a and b. Because now a and b are no longer conditions in the
> where statement (now included in the group by) using the existing
index
> is no longer possible or feasible. Because of legacy SQL code I am
> worried about simply changing the order at this time. Adding a second
> index is frowned upon due to the huge nature of the table and the
space
> requirements.

The index order matters.

Example: INDEX(A,B,C)

You can use that index to search on (A) or (on A and B) or (on A,B and
C), but you cannot use it to search on C. Given this flexibility,
organize the columns in the index in an order that will support the
widest range of queries. Remember that you can rarely afford to support
all possible indexes because of the overhead indexes add to DML
operations so make sure you pick the most effective set of indexes.



--
MySQL Windows Mailing List
For list archives: http://lists.mysql.com/win32
To unsubscribe: http://lists.mysql.com/win32?unsub=3Dgcdmw-win32@m.gmane.org

Re: Multi-Column Indexes, field order, and performance.

am 13.09.2006 07:31:42 von Michael Louie Loria

--------------enigB7D4D631A7A2CC6ED6A74A8B
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I think it's ok. If you want you could use the EXPLAIN statement to
check which order of index is better (A,B,C),(B,A,C), (C,B,A) or
(B,C,A), etc... Take note of the '2 rows in set (0.04 sec)' from the
sample. You could probably consider the duration as a part of your testin=
g.

Try to fill the table with records (start with 10 records, then
increment it by 10) so you could have better results with the EXPLAIN
statement. It's better to test with different number of records to
really see the changes in duration.

Sample EXPLAIN statement
(http://dev.mysql.com/doc/refman/5.0/en/explain.html)

mysql> EXPLAIN SELECT a,b FROM t1 WHERE a<2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: range
possible_keys: a
key: a
key_len: 5
ref: NULL
rows: 2
Extra: Using where with pushed condition
2 rows in set (0.04 sec)


- Mic


Mark Smith wrote:
> Thank you Michael,
>=20
> I am a where of the limitations that you outline. My question though i=
f
> all of your queries contain WHERE conditions (with a, b, and c) then
> does it matter if the index is (on c, b, a)?
>=20
> Example
>=20
> Select # different types of SQL
> FROM silly_table
> WHERE
> a =3D 1 and # could be any value
> b =3D 2 and # could be any value
> c >=3D '2006-09-01 00:00:00' and c < '2006-09-01 00:00:00';
> # could be any time range
>=20
> The silly table might look like:
> CREATE TABLE silly_table (
> a float NOT NULL default 0,
> b float NOT NULL default 0,
> c DateTime NOT NULL default '0000-00-00 00:00:00',
> # Now here is the question does it matter which one I create?
> # KEY my_key(a,b,c),
> # KEY my_key(c,b,a),
> # KEY my_key (any combination there of?),
> ) Engine=3DMyISAM;
>=20
> Thanks,
>=20
> Mark


--------------enigB7D4D631A7A2CC6ED6A74A8B
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPG Public Key: https://www.biglumber.com/x/web?qs=0x4A256EC8
Comment: GPG Public Key: http://www.lorztech.com/GPG.txt
Comment: Google, Skype, Yahoo ID: michaellouieloria

iQEVAwUBRQeXxbXBHi2y3jwfAQqiAAf/SdcyH6IJulF8yEmXGzAogqKevJv2 cY4I
CJK++6m1CNjmmII3ZyEhW+1Af03ncEK8B2e3QjPmNq91SthNgMV4aXud61pc VTBe
9B6aYVURqal2WGKcFnj+Bb6CIPfzc3UAIjIl7qhoUbIuryhioJDHFbMhjctJ JjIQ
QGcuuXOz8O2fUn3PShSpfCAK7RGT+buxZ0Gq0OTCZOXNIp6xxROYzFRef7Kg 9k08
4cpoJPr9ED/8oFW0RjOOMGrJc78Q0C8KoCOYN1f/wJm+dxvQrg/yg3JZeJel nBCG
5UvMQXMBhIT7QcTi7u+9MQZ0WxAvVDmtojLN//C5lh/1Ys3SpHICmA==
=xn5y
-----END PGP SIGNATURE-----

--------------enigB7D4D631A7A2CC6ED6A74A8B--