Putting data in seperate tables

am 11.01.2005 13:53:50 von Greg Quinn

------=_NextPart_000_000D_01C4F7ED.5CAF03E0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

We are developing a server business package, one of the tools it has is =
an email client.

Now since all data will be stored on the server, having several people =
at once pulling 5000 emails every few seconds off the server can put =
some serious strain on the server, also considering most of our client's =
servers will be normal Pentium 3s or 4 desktops.

We are testing our software with 5 users, and already the mailinbox =
table file is 2 Gigs big. Pulling 5000 messages from this table takes up =
to 40 seconds sometimes and is really slow. I have spent vast amounts on =
time optimising the tables, indexes and configuration, but this is still =
too slow.

So I have decided to place each users mail into a a mailinbox table of =
their own. I assume this will create a vast speed improvement as the =
mySQL engine doesn't need to do a disk seek and retrieve each user's =
mail from one big mail table.

One question I have though is, which query would be faster... a query =
that does a straight retrieval, or one where an index is used??

select inboxmailid, subject, message, datereceived, priority from =
mailinbox

or

select inboxmailid, subject, message, datereceived, priority from =
mailinbox where userid =3D 1

UserID will always be 1 in the table...

Thanks

------=_NextPart_000_000D_01C4F7ED.5CAF03E0--

Re: Putting data in seperate tables

am 11.01.2005 16:09:34 von SGreen

--=_alternative 0053819F85256F86_=
Content-Type: text/plain; charset="US-ASCII"

"Greg Quinn" wrote on 01/11/2005 07:53:50 AM:

> We are developing a server business package, one of the tools it has
> is an email client.
>
> Now since all data will be stored on the server, having several
> people at once pulling 5000 emails every few seconds off the server
> can put some serious strain on the server, also considering most of
> our client's servers will be normal Pentium 3s or 4 desktops.
>
> We are testing our software with 5 users, and already the mailinbox
> table file is 2 Gigs big. Pulling 5000 messages from this table
> takes up to 40 seconds sometimes and is really slow. I have spent
> vast amounts on time optimising the tables, indexes and
> configuration, but this is still too slow.
>
> So I have decided to place each users mail into a a mailinbox table
> of their own. I assume this will create a vast speed improvement as
> the mySQL engine doesn't need to do a disk seek and retrieve each
> user's mail from one big mail table.
>
> One question I have though is, which query would be faster... a
> query that does a straight retrieval, or one where an index is used??
>
> select inboxmailid, subject, message, datereceived, priority from
mailinbox
>
> or
>
> select inboxmailid, subject, message, datereceived, priority from
> mailinbox where userid = 1
>
> UserID will always be 1 in the table...
>
> Thanks
>

If EVERY row of a table has the same data value (userid=1) then the
cardinality of that data will be extremely low (at or near 1). That means
that the accuracy of an index on that column would be next to nothing and
the query engine won't use it. In order for an index to be used in a query
you must match on approximately 30% or less of the table. (The exact
threshold depends on several factors but 30% is a good ballpark figure.)

If every row of a table has the same userid then an indexed search for
that value would return 100% of the rows every time. It saves time, due to
reducing the number if disk seeks, to just skip the index and read the
table directly. So to answer your question, both of your queries would be
executed in the exact same manner, under the conditions you describe.

If you are worried about disk performance, I recommend that you increase
the number of physical drives you have. With two physical i/o channels you
can almost double your storage and retrieval rate. More disks implies you
get more available seeks per second. You can partition your data storage
over the multiple drives in any manner that made sense (disk1: A-L,
disk2:M-Z, etc...). Each drive would host it's own set of databases. Each
database would contain several user tables. As your DBs continue to grow,
you scale the system by adding drives and repartitioning your data.

Just a suggestion.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
--=_alternative 0053819F85256F86_=--

RE: Putting data in seperate tables

am 11.01.2005 16:41:21 von SGreen

--=_alternative 00566AC385256F86_=
Content-Type: text/plain; charset="US-ASCII"

I know that your options for physical partitioning depend on your OS, your
filesystem, and whether you are using InnoDB or MyISAM. I cannot speak
from personal experience on the mechanics of this but I know that several
others on the list have performed this particular optimization (which is
how I knew to recommend it).

I am cross-posting this to both the Win32 and the General lists so that
one of them will be able to answer your questions.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

"Patrick Questembert" wrote on 01/11/2005 10:36:30
AM:

> Good suggestions! How does one go about specifying on which physical
disk a
> table or database should reside?
> Alternatively, do you think stripped disks may be effective or will
MySQL
> not necessarily store data sequentially?
>
> Patrick
>
> -----Original Message-----
> From: SGreen@unimin.com [mailto:SGreen@unimin.com]
> Sent: Tuesday, January 11, 2005 10:10 AM
> To: Greg Quinn
> Cc: win32@lists.mysql.com
> Subject: Re: Putting data in seperate tables
>
>
> "Greg Quinn" wrote on 01/11/2005 07:53:50 AM:
>
> > We are developing a server business package, one of the tools it has
> > is an email client.
> >
> > Now since all data will be stored on the server, having several
> > people at once pulling 5000 emails every few seconds off the server
> > can put some serious strain on the server, also considering most of
> > our client's servers will be normal Pentium 3s or 4 desktops.
> >
> > We are testing our software with 5 users, and already the mailinbox
> > table file is 2 Gigs big. Pulling 5000 messages from this table
> > takes up to 40 seconds sometimes and is really slow. I have spent
> > vast amounts on time optimising the tables, indexes and
> > configuration, but this is still too slow.
> >
> > So I have decided to place each users mail into a a mailinbox table
> > of their own. I assume this will create a vast speed improvement as
> > the mySQL engine doesn't need to do a disk seek and retrieve each
> > user's mail from one big mail table.
> >
> > One question I have though is, which query would be faster... a
> > query that does a straight retrieval, or one where an index is used??
> >
> > select inboxmailid, subject, message, datereceived, priority from
> mailinbox
> >
> > or
> >
> > select inboxmailid, subject, message, datereceived, priority from
> > mailinbox where userid = 1
> >
> > UserID will always be 1 in the table...
> >
> > Thanks
> >
>
> If EVERY row of a table has the same data value (userid=1) then the
> cardinality of that data will be extremely low (at or near 1). That
means
> that the accuracy of an index on that column would be next to nothing
and
> the query engine won't use it. In order for an index to be used in a
query
> you must match on approximately 30% or less of the table. (The exact
> threshold depends on several factors but 30% is a good ballpark figure.)
>
> If every row of a table has the same userid then an indexed search for
> that value would return 100% of the rows every time. It saves time, due
to
> reducing the number if disk seeks, to just skip the index and read the
> table directly. So to answer your question, both of your queries would
be
> executed in the exact same manner, under the conditions you describe.
>
> If you are worried about disk performance, I recommend that you increase
> the number of physical drives you have. With two physical i/o channels
you
> can almost double your storage and retrieval rate. More disks implies
you
> get more available seeks per second. You can partition your data storage
> over the multiple drives in any manner that made sense (disk1: A-L,
> disk2:M-Z, etc...). Each drive would host it's own set of databases.
Each
> database would contain several user tables. As your DBs continue to
grow,
> you scale the system by adding drives and repartitioning your data.
>
> Just a suggestion.
>
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
>

--=_alternative 00566AC385256F86_=--