am i creating a performance bottleneck?

am i creating a performance bottleneck?

am 18.07.2007 02:31:11 von Mary Anderson

Hi all,

In order to save a fair amount of space I am creating a schema
against which I will want to run the following simplified query. Think
of the size of the tables as data ~ 30 M rows, series has 10,00
entries, data_has_dimensions has 300 K rows, dimensions is tiny

select population = da.value,
residence = di.value
from data da, data_has_dimension dhd, dimensions di, series si
where da.series_id = si.series_id
and si.series = 'my-series'
and dhd.data_id = da.data_id
and dhd.dimension_id = di.dimension_id
and di.dimension = 'residence'

UNION

select population = da.value,
residence = 'total'
where da.series_id = si.series_id
and si.series = 'my_series'
and NOT EXISTS(select dhd.da_id
from data_has_dimensions dhd, dimensions di
where di.dimension = 'residence'
and dhd.dimension_id = di.dimension_id
and dhd.da_id = da.da_id)

I am most worried about the second select, with its 'NOT EXISTS'
statement slowing everything down. I would put appropriate indexes on
this -- namely an index on series for data and an index on dhd for
da_id. My user community is a bunch of academics, so I am not under
the performance constraints I would have for a business application.

Would it help performance if I denormalized the database by attaching
series to the data_has_dimensions table?

Thanks,
Mary Anderson

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: am i creating a performance bottleneck?

am 18.07.2007 12:44:18 von n0g0013

On 17.07-17:31, Mary Anderson wrote:
[ ... ]
> I am most worried about the second select, with its 'NOT EXISTS'
> statement slowing everything down. I would put appropriate indexes on
> this -- namely an index on series for data and an index on dhd for
> da_id. My user community is a bunch of academics, so I am not under
> the performance constraints I would have for a business application.
>
> Would it help performance if I denormalized the database by attaching
> series to the data_has_dimensions table?

to be honest i couldn't really understand your SQL but that probably
says more about me than you. either way i don't think you need to
worry about the NOT EXISTS statement. generally, serialising select
statements is better than joining and would suggest that if your series
data has 10,000 rows (assuming 10,00 was a typo) then i wouldn't merge
that data with another table (it also will simply mean a larger,
disjoint table, essentially equivelant to the join table required your
queries and thus may well cost you in overall performance). i would
suggest you try to restructure you queries to serialise the 'series'
select joins (and preferably only doing it once), thereby reducing
the final join table to a minimum.

--
t
t
w

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings