Optimising a very large table

am 19.02.2010 23:47:34 von Nathan Harmston

Hi everyone,

I am currently working on an application where I have a very large
table called intervals ( atm its 80 000 000 records and growing ), and
a smaller table ( token ) which join with it.

interval is just an id, start, end, word
token is id, interval_id, type, processed_by

There is a many to one......ie one interval can have many tokens. The
idea being that I as used different tools I generate different tokens
but the underlying intervals tend to be the same. When I add a new
token I first need to search the intervals table to see if one exists
in there. Of course theres an index on intervals to deal with this.
But as I add more and more new fields I notice a massive slow down in
processing. I think this due to the increase in new records being
added and indexed. The problem is I can't turn indexes off as I have
80 million records.

Does anyone have any suggestions for optimising this design? Or where
to start from? One option and at the moment the only option I have is
to denormalise my schema but this will complicate stuff at the
application level considerably.

Any suggestions welcome,

Nathan

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Optimising a very large table

am 20.02.2010 13:07:50 von walter harms

Nathan Harmston schrieb:
> Hi everyone,
>
> I am currently working on an application where I have a very large
> table called intervals ( atm its 80 000 000 records and growing ), and
> a smaller table ( token ) which join with it.
>
> interval is just an id, start, end, word
> token is id, interval_id, type, processed_by
>
> There is a many to one......ie one interval can have many tokens. The
> idea being that I as used different tools I generate different tokens
> but the underlying intervals tend to be the same. When I add a new
> token I first need to search the intervals table to see if one exists
> in there. Of course theres an index on intervals to deal with this.
> But as I add more and more new fields I notice a massive slow down in
> processing. I think this due to the increase in new records being
> added and indexed. The problem is I can't turn indexes off as I have
> 80 million records.
>
> Does anyone have any suggestions for optimising this design? Or where
> to start from? One option and at the moment the only option I have is
> to denormalise my schema but this will complicate stuff at the
> application level considerably.
>

Hi,
your Problem sound like a time series. The problem i am dealing with.
IMHO There is no real solution, we have splitted the TS and use several tables.
(1 table per month here but it depends on your exact problem). That moves the problems
"what table" to the application. Since MySql 5.1 there are partitions, that moves
the problem back to the DB.
The "solution" depends on your usage pattern. Clearly there is nothing like a
"clever" select statement.

For you token-Problem you can use immodb for foreign keys, if insert fail simply check
if you need to add an other token in the token table.

NTL you need to decide how long you will store, lets assume you have 1E6 Data / day and
you want to store for 40 Year then you get 40*360*1E6 Data do you want to handle that ?

re,
wh

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org