MySQL millions row OLTP dimensioning

am 02.11.2006 09:41:07 von Federico

Hi everybody,

I'm evaluating the possibility of using MySQL 5.0 as a database backend
for an application we are doing and I'd like to have hardware
requirements rough estimates and/or real world experience from people
that are already running similar dimensioned database/application.

As a rought estimate the database has around 80 tables 4-5 of these are
supposed to grow up to 5.000.000 records another 30% will be in the
100.000 range while the remaining will be in the 10.000 range. The
biggest table (unfortunately one of the millions rows table) is around
70 fields but the remaining are definitely smaller.
The application is a java web intranet application that will serve
around 500 users (will grow up to 1000). Almost all queries are pretty
simple: no recursion, no subqueries, aggregate values queries on small
ad hoc tables where condition on indexed fields and in many cases just
on primary keys. The only complicacy in our queries is sometime the
number of joins that can get up to 5 table join, in some situation the
data for a page is retieved with a single query on a million rows table
plus 3/4 queries on the other smaller tables or a query on a million
row table joined with another 3/4 smaller tables.

At the moment we thought we could make it with a double 3.0Ghz xeon
with 4GB ram and 320GB RAID5

What do you think? I'd like to hear real world experience with similar
size dimensioned MySQL db or suggestion on how to have a better
hardware estimate.

Re: MySQL millions row OLTP dimensioning

am 02.11.2006 11:42:12 von Axel Schwenke

"Federico" wrote:
>
> I'm evaluating the possibility of using MySQL 5.0 as a database backend
> for an application we are doing and I'd like to have hardware
> requirements rough estimates and/or real world experience from people
> that are already running similar dimensioned database/application.

[description snipped]

Insufficient data. Do you need advanced features like referrential
constraints, row level locks or transactions? You said OLTP. What's the
read/write ratio? Do you expect to see read/write concurrency on some
tables? (this is to decide which storage engine to use)

Regarding JOINs - you should go for MySQL 5.x. JOIN handling has some
flaws in 4.x and earlier. With proper indexing JOINs on big tables
should not be critical.

> At the moment we thought we could make it with a double 3.0Ghz xeon
> with 4GB ram and 320GB RAID5

*NEVER* use RAID5 for a database. RAID5 yields poor performance for
short writes (that is: writes that are shorter than the RAID block
size). If you want redundancy (you want it) go for RAID10.

The memory requirements depend on the chosen storage engine. MyISAM
uses memory slightly better but misses some features. Also MyISAM may
cause severe performance degradation due to locking conflicts. If you
chose MyISAM, add the size of the index files of the active tables.
This is your key_buffer size. Multiply by 2. This is the total memory
needed. Guessing what indexes are considered "active" is difficult.
In most cases you can do well with just 10-20% of all indexes.

If you use InnoDB you should calculate approx. twice the memory.
And twice the disk space too.

Regarding real world experience: some years ago I designed good part
of a medium sized web portal (jjj.wbocvybg.qr). The database behind the
12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
100 tables, biggest table 35mio rows. Quite similar to your setup.
We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.

Solaris was much better than Linux 2.4 for multithreaded stuff back
those days. Today I would go for Linux 2.6 on Opteron hardware.

HTH, XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Re: MySQL millions row OLTP dimensioning

am 02.11.2006 12:58:58 von Federico

>
> Insufficient data. Do you need advanced features like referrential
> constraints, row level locks or transactions? You said OLTP. What's the
> read/write ratio? Do you expect to see read/write concurrency on some
> tables? (this is to decide which storage engine to use)

you are right, I've missed this crucial points (also because it's
difficult now to give more than rough estimates for some of these
things).

We need support for transactions and won't do any explicit row level
lock, will use innodb storage engine and MySQL 5.0. No referential
constraints, there will be processes concurrently making
SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
concurency?) the read/write ratio will probably be more read than write
I would say on a 100 or even 1000 ratio for some tables and 10 or even
less for some other tables including some million records table but
these are just rough estimates.

I think each page would require a reasonable number of queries to be
composed (usually not more than 5, often just a couple) and we can use
also an application layer cache for most used queries.

Actually there are some cases in which we always have to do 2 queries
even if we return all data just with one query as we need result paging
so we need to do one query to retrieve a fixed amount of rows (LIMIT
contraints no queries) and another query to COUNT the rows matching
that query.

>
> Regarding real world experience: some years ago I designed good part
> of a medium sized web portal (jjj.wbocvybg.qr). The database behind the
> 12 web servers was MySQL 3.23.x with MyISAM tables. 30GB+ data in some
> 100 tables, biggest table 35mio rows. Quite similar to your setup.
> We ran it on a SUN Enterprise V480, 4x1GHz USIII CPU, 8GB RAM.
>
> Solaris was much better than Linux 2.4 for multithreaded stuff back
> those days. Today I would go for Linux 2.6 on Opteron hardware.

Wow, I wonder I could you have just 30GB data for 100 tables with one
table with 35millions rows. We plan to use SLES10 that has a Linux 2.6
and EM64 processor support.
How many concurrent users could you have at max and how many queries
did you actually do to the 35 millions rows table?

Thanks a lot for your really valuable suggestions!

Re: MySQL millions row OLTP dimensioning

am 02.11.2006 13:46:12 von Axel Schwenke

"Federico" wrote:

> We need support for transactions and won't do any explicit row level
> lock, will use innodb storage engine and MySQL 5.0. No referential
> constraints, there will be processes concurrently making
> SELECT/INSERT/UPDATE on tables (is this that you mean with read/write
> concurency?) the read/write ratio will probably be more read than write
> I would say on a 100 or even 1000 ratio for some tables and 10 or even
> less for some other tables including some million records table but
> these are just rough estimates.

Well, if you target InnoDB already, read/write ratio is not so
important any more. Due to it's more granular locking (compared to
MyISAM) InnoDB does well for concurrent read/writes. It will use
some memory to keep row backups though.

> Actually there are some cases in which we always have to do 2 queries
> even if we return all data just with one query as we need result paging
> so we need to do one query to retrieve a fixed amount of rows (LIMIT
> contraints no queries) and another query to COUNT the rows matching
> that query.

You can do that in one query. See SQL_CALC_FOUND_ROWS.

> Wow, I wonder I could you have just 30GB data for 100 tables with one
> table with 35millions rows.

Luckily this table had rather small records. IIRC that were 6GB data
and 2GB index. By far the biggest table.

> We plan to use SLES10 that has a Linux 2.6 and EM64 processor support.

Good. You should definitely go 64 bit.

> How many concurrent users could you have at max and how many queries
> did you actually do to the 35 millions rows table?

Dunno. I never counted queries/table. Regarding users: the web servers
were configured to allow up to 100 active requests. This totals to
max. 1200 possible clients. In real life there were seldom more than
300 concurrent connections to the database. If you deal with that much
clients you should be careful to close any connection as soon as you
don't need it any more. With Java you should look out for a suitable
connection pooling method.

For the curious: here is a sample statistics for the database server
http://24days.de/~schwenke/asing/example/solaris/

(disk md4 holds the binary log for the replication to the fallback sql
node, disk ssd5 holds the table space; ssd5 was located on an external
storage box - connected via 1GBit fibre channel
since this is MySQL 3.23 there is no query cache -> no cached queries
each webserver holds a replica of some quasi-static tables, there were
additional ~500 queries/sec distributed to the web servers)

XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Re: MySQL millions row OLTP dimensioning

am 02.11.2006 15:18:16 von Federico

>
> > Actually there are some cases in which we always have to do 2 queries
> > even if we return all data just with one query as we need result paging
> > so we need to do one query to retrieve a fixed amount of rows (LIMIT
> > contraints no queries) and another query to COUNT the rows matching
> > that query.
>
> You can do that in one query. See SQL_CALC_FOUND_ROWS.
>

This is very interesting exactly wha I was looking for. Even though I
don't know if hibernate will allow me to use anything similar...

> > Wow, I wonder I could you have just 30GB data for 100 tables with one
> > table with 35millions rows.
>
> Luckily this table had rather small records. IIRC that were 6GB data
> and 2GB index. By far the biggest table.

IIRC?
We will have some bigger tables but all query are indexed and almost
all indexed fields are few bytes (from INT or TIMESTAMP) almost no
indexed VARCHAR.

> > How many concurrent users could you have at max and how many queries
> > did you actually do to the 35 millions rows table?
>
> If you deal with that much
> clients you should be careful to close any connection as soon as you
> don't need it any more. With Java you should look out for a suitable
> connection pooling method.

we are already using tomcat jndi mysql connection pooling and have
template classes that automatically relase the connection as long as
they have finished with that.

>
> For the curious: here is a sample statistics for the database server
> http://24days.de/~schwenke/asing/example/solaris/

very interesting. Do you think dedicated Gigaethernet connection
between 2 servers would have pretty much the same performance? (I mean
I know fiber channel is theorically better but maybe is not needed in
my case).

> since this is MySQL 3.23 there is no query cache -> no cached queries

don't know whether to use query cache or not: as I told you we can use
hibernate caching that would directly cache in RAM or on disk without
even going to the database for positive hits.
On the other side I don't know how much do we really need caching as
most users will issue different queries returning different results.
E.g. does it has much sense caching on an order tables when each users
is returning just his orders?

Even though I know there are many things more to consider the web
application you had on this server make me think we are not that wrong
in our hardware configuration. Maybe we just need more RAM now we have
just 4G, might increase to 8 or 16G and yes I know with databases ram
is never enough :-)

Do you know about any good profiling/sizing tool? My problem is that I
do not already have the populated database and it's difficult to make
realistic profiling just with few data, I had a look on DBMonster,
looks pretty good but it seems kind of abandoned project. Is there any
similar or better tool around even commercial?