Improve SELECT command

Improve SELECT command

am 12.05.2006 17:01:54 von Stefaan Lhermitte

Dear MySQL-ians,

I perform a SELECT on my database, but it takes over a minute for every
run. I have to run it over 10000 times (with different values in the
WHERE), so it takes way too long. A was therefore wondering if I could
improve the query speed. Below you find the query. It is based on the
ratio between a pixel (pix) vs. the average of its neighbourhood (from
geo) in the same table (vgt) based on additional requirements
(mgba,mgsc,eco).

Any comment on the query/design is welcome.

SELECT vgt.obs,pix.NDVI/AVG(vgt.NDVI)
FROM vgtData.VGT as pix,
(vgt.meta_gsc_2000 as mgsc RIGHT JOIN vgt.gsc2000 as gsc ON
gsc.gsc2000_XX = mgsc.id)
INNER JOIN vgt.geo_1000 as geo ON geo.id = gsc.id
INNER JOIN vgtData.VGT as vgt ON geo.id = vgt.id
INNER JOIN vgt.v_ecoclim AS eco ON geo.id = eco.id
INNER JOIN (vgt.gba2000 as gba LEFT JOIN vgt.meta_gba_2000 as mgba ON
gba.gba2000_XX = mgba.id) ON gba.id=geo.id
WHERE pix.obs = vgt.obs
AND pix.id = 200265
AND vgt.obs > 55
AND vgt.obs <= 127
AND (geo.X_coord BETWEEN 1525 AND 1545)
AND (geo.Y_coord BETWEEN 110 AND 130)
AND mgba.burnt=0
AND mgsc.burnt=0
AND eco.v_landcov=3
AND eco.v_lowreb=10
AND vgt.B<27
AND vgt.SWIR<250
AND pix.B<27
AND pix.SWIR<250
GROUP BY vgt.obs;

The EXPLAIN command gives me the following result:
table type possible_keys key len ref
rows Extra
----- ------ -------------------------- -------- -- --------------
----- --------------------------------------------
eco ref PRIMARY,v_landcov,v_lowreb v_lowreb 3 const
17021 Using where; Using temporary; Using filesort
geo eq_ref PRIMARY,X_coord,Y_coord PRIMARY 3 eco.id 1
Using where
gsc eq_ref PRIMARY,gsc2000_XX PRIMARY 3 geo.id 1

mgsc eq_ref PRIMARY,burnt PRIMARY 3 gsc.gsc2000_XX 1
Using where
vgt ref id,obs id 4 geo.id 1
Using where
pix ref id,obs id 4 const 156
Using where
gba eq_ref PRIMARY,gba2000_XX PRIMARY 3 geo.id 1

mgba eq_ref PRIMARY,burnt PRIMARY 3 gba.gba2000_XX 1
Using where


and the tables look like;
###################################################
vgtData.VGT
Field Type Null Key Default Extra
----- -------------------- ---- --- ------- ------
id mediumint(8) unsigned YES MUL (null)
obs tinyint(4) unsigned YES MUL (null)
B tinyint(4) unsigned YES (null)
SWIR tinyint(4) unsigned YES (null)
NDVI tinyint(4) unsigned YES (null)
####################################################

#########################################################
vgt.geo_1000
Field Type Null Key Default Extra
------ --------------------- ---- --- ------- ---------
id mediumint(8) unsigned PRI (null) auto_incr
X_coord smallint(6) unsigned YES MUL (null)
Y_coord smallint(7) unsigned YES MUL (null)
#########################################################


#########################################################
vgt.v_ecoclim
Field Type Null Key Default Extra
--------- --------------------- ---- --- ------- ---------
id mediumint(8) unsigned PRI (null) auto_incr
v_lowreb smallint(7) YES MUL (null)
v_landcov smallint(7) YES MUL (null)
#########################################################

#########################################################
vgt.gba2000
Field Type Null Key Default Extra
---------- --------------------- ---- --- ------- ---------
id mediumint(8) unsigned PRI (null) auto_incr
gba2000_XX smallint(7) unsigned YES MUL (null)
#########################################################

#########################################################
vgt.meta_gba_2000
Field Type Null Key Default Extra
----- --------------------- ---- ---- ------- --------
id mediumint(8) unsigned PRI (null) auto_incr
burnt tinyint(1) YES MUL (null)
#########################################################

#########################################################
vgt.gsc2000
Field Type Null Key Default Extra
---------- --------------------- ---- --- ------- ---------
id mediumint(8) unsigned PRI (null) auto_incr
gsc2000_XX smallint(7) unsigned YES MUL (null)
#########################################################

#########################################################
vgt.meta_gsc_2000
Field Type Null Key Default Extra
----- --------------------- ---- ---- ------- --------
id mediumint(8) unsigned PRI (null) auto_incr
burnt tinyint(1) YES MUL (null)
#########################################################

I am introducing indexes on vgtData.SWIR and vgtData.B but they have a
very low cardinality, so I assume it won't make a big difference.

Does anyone has a suggestion to make it faster?

Thanx in advance and kind regards,
Stef

Re: Improve SELECT command

am 12.05.2006 18:24:21 von Bill Karwin

stefaan.lhermitte@agr.kuleuven.ac.be wrote:
> SELECT vgt.obs,pix.NDVI/AVG(vgt.NDVI)
....
> GROUP BY vgt.obs;

This is a bit odd, because you are not grouping by pix.NDVI. In some
RDBMS implementations, it is mandatory to group by all columns mentioned
in the select-list that are not used inside aggregate functions. MySQL
permits this. However, the value it returns for pix.NDVI will be some
arbitrarily chosen value in the group. Though in this case I assume you
restrict the pix table sufficiently, so it doesn't cause any ambiguity.

> The EXPLAIN command gives me the following result:
> table type possible_keys key len ref
> rows Extra
> ----- ------ -------------------------- -------- -- --------------
> ----- --------------------------------------------
> eco ref PRIMARY,v_landcov,v_lowreb v_lowreb 3 const
> 17021 Using where; Using temporary; Using filesort
> geo eq_ref PRIMARY,X_coord,Y_coord PRIMARY 3 eco.id 1
> Using where
> gsc eq_ref PRIMARY,gsc2000_XX PRIMARY 3 geo.id 1
>
> mgsc eq_ref PRIMARY,burnt PRIMARY 3 gsc.gsc2000_XX 1
> Using where
> vgt ref id,obs id 4 geo.id 1
> Using where
> pix ref id,obs id 4 const 156
> Using where
> gba eq_ref PRIMARY,gba2000_XX PRIMARY 3 geo.id 1
>
> mgba eq_ref PRIMARY,burnt PRIMARY 3 gba.gba2000_XX 1
> Using where

The temporary table & filesort mentioned in the line for eco is probably
the culprit. These are on-disk data operations, which is very slow
compared to in-memory operations. The GROUP BY is probably requiring
the temp table, because it isn't using the index on vgt.obs. You could
try using "FORCE INDEX (obs)" to make it use that index. But I'm not
sure that this would relieve the filesort, and it would also prevent use
of the vg.id index.

> Does anyone has a suggestion to make it faster?

MySQL 5 _might_ deal with this better. In some circumstances, MySQL 5
can use more than one index per table. MySQL 4.1 and earlier have a
limitation of one index per table in a given query. MySQL 5 also has
some improvements to the filesort algorithm.

You could try increasing your key_buffer_size server parameter, so that
it's large enough to contain all the indexes used (according to the
EXPLAIN report). Also, preload the indexes.

See
http://dev.mysql.com/doc/refman/5.0/en/server-system-variabl es.html
http://dev.mysql.com/doc/refman/5.0/en/index-preloading.html

Regards,
Bill K.

Re: Improve SELECT command

am 14.05.2006 02:57:04 von Bernhard Brueck

stefaan.lhermitte@agr.kuleuven.ac.be wrote:

> Dear MySQL-ians,
>
> I perform a SELECT on my database, but it takes over a minute for every
> run. I have to run it over 10000 times (with different values in the
> WHERE), so it takes way too long. A was therefore wondering if I could
....
> SELECT vgt.obs,pix.NDVI/AVG(vgt.NDVI)
> FROM vgtData.VGT as pix,
> (vgt.meta_gsc_2000 as mgsc RIGHT JOIN vgt.gsc2000 as gsc ON
> gsc.gsc2000_XX = mgsc.id)
....
Perhaps you can store the right join in a temporary table
and reuse it for the subsequent queries.

Re: Improve SELECT command

am 18.05.2006 10:14:29 von Stefaan Lhermitte

Thanks Bill and Bernard.

I am indeed upgrading my mysql version to 5.0 and hoping to use the
improved selection criteria there.

Thanks for your help!
Stef