Indexing? (Warning: relative newbie.)

am 24.06.2009 17:07:22 von Ken

Hi, all. I'm a long-time MySQL user who's only recently had to start
learning some administrative stuff, largely because I finally have a
decently-sized database. My database is about 100 GB; I'm using it -- via
dbmail (www.dbmail.org) -- as a mail server for my company. While dbmail
is well-and-good with its IMAP front-end, I'm thinking of writing a Python
front-end to do some queries directly against MySQL. But some of them
take a l-o-n-g time. As an example, I've got a table with slightly over a
million records; I'd like to be able to show (say) only IDs of messages
under a half-MB. The query would look something like this:

select physmessage_id,blocksize from dbmail_messageblks where blocksize <
500000;

That query takes 50 minutes. A smidge long to wait.

So I said, "Huh. That's impressive." And I tried it without the
physmessage_id:
select blocksize from dbmail_messageblks where blocksize < 500000;

That took 14 seconds. A bit more in my timeframe. Can I optimize this
with indexing? Should I be using a different DB engine? Is there a
site/book I should be learning DBA fundamentals from that might offer me
direction for stuff like this?

Sorry for all the newbie questions, but I haven't done serious database
stuff since Foxbase/dBase III days. Things have changed a little since
then.

Thanks!

-Ken

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Indexing? (Warning: relative newbie.)

am 24.06.2009 18:03:30 von tlittle

To answer your questions in no particular order, YES you can speed it up
with indexing.

You might want to first create an index on ( blocksize AND
physmessage_id ).

Why, you might ask, index on physmessage_id? Because then the db won't
have to do a fetch on items from the table since it's in the INDEX
itself, saving any unnecessary reads.

Realistically, I can't see that taking more than a few seconds, at most,
to execute. However, making the index might take a serious bit of time.

Please let us all know how it does or does not work.

Tim...

-----Original Message-----
From: Ken D'Ambrosio [mailto:ken@jots.org]=20
Sent: Wednesday, June 24, 2009 11:07 AM
To: mysql@lists.mysql.com
Subject: Indexing? (Warning: relative newbie.)

Hi, all. I'm a long-time MySQL user who's only recently had to start
learning some administrative stuff, largely because I finally have a
decently-sized database. My database is about 100 GB; I'm using it --
via
dbmail (www.dbmail.org) -- as a mail server for my company. While
dbmail
is well-and-good with its IMAP front-end, I'm thinking of writing a
Python
front-end to do some queries directly against MySQL. But some of them
take a l-o-n-g time. As an example, I've got a table with slightly over
a
million records; I'd like to be able to show (say) only IDs of messages
under a half-MB. The query would look something like this:

select physmessage_id,blocksize from dbmail_messageblks where blocksize
<
500000;

That query takes 50 minutes. A smidge long to wait.

So I said, "Huh. That's impressive." And I tried it without the
physmessage_id:
select blocksize from dbmail_messageblks where blocksize < 500000;

That took 14 seconds. A bit more in my timeframe. Can I optimize this
with indexing? Should I be using a different DB engine? Is there a
site/book I should be learning DBA fundamentals from that might offer me
direction for stuff like this?

Sorry for all the newbie questions, but I haven't done serious database
stuff since Foxbase/dBase III days. Things have changed a little since
then.

Thanks!

-Ken

--=20
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--=20
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: =
http://lists.mysql.com/mysql?unsub=3Dtlittle@tgrnet.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Indexing? (Warning: relative newbie.)

am 24.06.2009 20:19:07 von Walter Heck

Hey Tim, all

On Wed, Jun 24, 2009 at 10:03 AM, Little,
Timothy wrote:
> Why, you might ask, index on physmessage_id? Â Because then the db wo=
n't
> have to do a fetch on items from the table since it's in the INDEX
> itself, saving any unnecessary reads.
FYI: That only holds true for InnoDB, not for MyISAM.

cheers,

--=20
Walter Heck, Engineer @ Open Query (http://openquery.com)
Affordable Training and ProActive Support for MySQL & related technologies

Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: Indexing? (Warning: relative newbie.)

am 26.06.2009 23:42:19 von Ken

After a few off-list e-mails with Tim, I issued

ALTER TABLE dbmail_messageblks ADD INDEX ( blocksize AND physmessage_id );

which took almost 11 hours to index. Once done, however, my select
statement went from a hair over 50 minutes to 15 seconds. (1.69 seconds
after the index was cached.)

Wow.

Thanks for the help, all!

-Ken

On Wed, June 24, 2009 12:03 pm, Little, Timothy wrote:
> To answer your questions in no particular order, YES you can speed it up
> with indexing.
>
> You might want to first create an index on ( blocksize AND
> physmessage_id ).
>
> Why, you might ask, index on physmessage_id? Because then the db won't
> have to do a fetch on items from the table since it's in the INDEX itself,
> saving any unnecessary reads.
>
> Realistically, I can't see that taking more than a few seconds, at most,
> to execute. However, making the index might take a serious bit of time.
>
> Please let us all know how it does or does not work.
>
>
> Tim...
>
>
> -----Original Message-----
> From: Ken D'Ambrosio [mailto:ken@jots.org]
> Sent: Wednesday, June 24, 2009 11:07 AM
> To: mysql@lists.mysql.com
> Subject: Indexing? (Warning: relative newbie.)
>
>
> Hi, all. I'm a long-time MySQL user who's only recently had to start
> learning some administrative stuff, largely because I finally have a
> decently-sized database. My database is about 100 GB; I'm using it -- via
> dbmail (www.dbmail.org) -- as a mail server for my company. While dbmail
> is well-and-good with its IMAP front-end, I'm thinking of writing a
> Python
> front-end to do some queries directly against MySQL. But some of them take
> a l-o-n-g time. As an example, I've got a table with slightly over a
> million records; I'd like to be able to show (say) only IDs of messages
> under a half-MB. The query would look something like this:
>
> select physmessage_id,blocksize from dbmail_messageblks where blocksize <
> 500000;
>
>
> That query takes 50 minutes. A smidge long to wait.
>
>
> So I said, "Huh. That's impressive." And I tried it without the
> physmessage_id:
> select blocksize from dbmail_messageblks where blocksize < 500000;
>
> That took 14 seconds. A bit more in my timeframe. Can I optimize this
> with indexing? Should I be using a different DB engine? Is there a
> site/book I should be learning DBA fundamentals from that might offer me
> direction for stuff like this?
>
> Sorry for all the newbie questions, but I haven't done serious database
> stuff since Foxbase/dBase III days. Things have changed a little since
> then.
>
> Thanks!
>
>
> -Ken
>
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is believed to be clean.
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=tlittle@tgrnet.com
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=ken@jots.org
>
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is believed to be clean.
>
>

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Indexing? (Warning: relative newbie.)

am 27.06.2009 04:18:01 von Shawn Green

Hi Walter,

Walter Heck - OlinData.com wrote:
> Hey Tim, all
>
> On Wed, Jun 24, 2009 at 10:03 AM, Little,
> Timothy wrote:
>> Why, you might ask, index on physmessage_id? Because then the db won't
>> have to do a fetch on items from the table since it's in the INDEX
>> itself, saving any unnecessary reads.
> FYI: That only holds true for InnoDB, not for MyISAM.
>

I think you have confused the InnoDB behavior of using the entire
PRIMARY KEY as the unique row identifier for each entry in a secondary
key with the practice of defining a "covering index".

Tim was correct: an index on (blocksize,physmessage_id) would allow that
query to avoid any direct reads of the data table as all of the
information for the query would have come from the index itself
(regardless of database engine).

--
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org