optimizing query

optimizing query

am 18.01.2011 18:22:21 von Simon Wilkinson

--001636c5b022a51e2d049a22243d
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am trying to optimize the following query:

SELECT articles.* FROM articles INNER JOIN newsletters ON
articles.newsletter_id = newsletters.id INNER JOIN users ON users.id =
newsletters.user_id WHERE users.id =12 AND MONTH(articles.created_at) = '12'
AND DAY(articles.created_at) = '5' ORDER BY YEAR(articles.created_at),
LENGTH(articles.body);

I am trying to retrieve all the articles created on a specific day of a
specific month that belong to a user, ordered by the oldest and then longest
article.

I have run explain on the query, and get the following:

+----+-------------+----------+-------+--------------------- ---------------------------------------------------+-------- ------------------------------------+---------+------------- --------+------+-------------------------------------------- --+
| id | select_type | table | type |
possible_keys |
key | key_len | ref |
rows | Extra |
+----+-------------+----------+-------+--------------------- ---------------------------------------------------+-------- ------------------------------------+---------+------------- --------+------+-------------------------------------------- --+
| 1 | SIMPLE | users | const |
PRIMARY |
PRIMARY | 4 | const
| 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | newsletters | ref |
PRIMARY,index_newsletters_on_user_id |
index_newsletters_on_user_id | 4 |
const | 1 | Using index |
| 1 | SIMPLE | articles | ref |
index_articles_on_newsletter_id,index_articles_on_newsletter _id_and_created_at
| index_articles_on_newsletter_id_and_created_at | 4 |
my_db.newsletters.id | 3 | Using where |

+----+-------------+----------+-------+--------------------- ---------------------------------------------------+-------- ------------------------------------+---------+------------- --------+------+-------------------------------------------- --+
3 rows in set (0.00 sec)

This seems pretty decent, and does perform pretty well for some users (~0.5
- 1 sec), but for some users (seemingly those with large numbers of
articles) the query can take 20 - 30 seconds to run. This seems really slow
to me. I tried adding in the index
'index_articles_on_newsletter_id_and_created_at' but the performance doesn't
seem to be any different then when it uses just the
'index_articles_on_newsletter_id' index. I think this might be because of
the functions I am using on the created_at column to get the day and month
from it, making an index on created_at useless in this instance.

Running both an 'optimize table entries' and 'analyze table entries' also
didn't seem to have any real impact on the performance.

I was wondering if anybody had any suggestions for what else I might be able
to try, or if there is a better way to search on dates in this manner. Any
ideas would be greatly appreciated.

Thanks,

Simon

--001636c5b022a51e2d049a22243d--

Re: optimizing query

am 18.01.2011 19:00:15 von Michael Dykman

Hi Simon,

once you apply functions to a field, an index on that field is pretty
much useless. For this particular query, I would be tempted to create
additional fields to store the values of MONTH(articles.created_at)
and DAY(articles.created_at). Create an index on (month_created,
day_created) You could just sort by articles.created_at; no need for
the YEAR function, the result will be the same given your other
selectors.

Given those adjustments, the query looks righteous enough.

- michael dykman

On Tue, Jan 18, 2011 at 12:22 PM, Simon Wilkinson
wrote:
> Hi,
>
> I am trying to optimize the following query:
>
> SELECT articles.* FROM articles INNER JOIN newsletters ON
> articles.newsletter_id =3D newsletters.id INNER JOIN users ON users.id =
=3D
> newsletters.user_id WHERE users.id =3D12 AND MONTH(articles.created_at) =
=3D '12'
> AND DAY(articles.created_at) =3D '5' ORDER BY YEAR(articles.created_at),
> LENGTH(articles.body);
>
> I am trying to retrieve all the articles created on a specific day of a
> specific month that belong to a user, ordered by the oldest and then long=
est
> article.
>
> I have run explain on the query, and get the following:
>
> +----+-------------+----------+-------+--------------------- -------------=
--------------------------------------+--------------------- ---------------=
--------+---------+---------------------+------+------------ ---------------=
-------------------+
> | id | select_type | table =A0 =A0| type =A0|
> possible_keys =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
> key =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0| key_len | ref =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |
> rows | Extra =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0|
> +----+-------------+----------+-------+--------------------- -------------=
--------------------------------------+--------------------- ---------------=
--------+---------+---------------------+------+------------ ---------------=
-------------------+
> | =A01 | SIMPLE =A0 =A0 =A0| users =A0 =A0| const |
> PRIMARY =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
> PRIMARY =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0| 4 =A0 =A0 =A0 | const
> | =A0 =A01 | Using index; Using temporary; Using filesort |
> | =A01 | SIMPLE =A0 =A0 =A0| newsletters | ref =A0 |
> PRIMARY,index_newsletters_on_user_id =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
> index_newsletters_on_user_id =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| 4 =A0 =
=A0 =A0 |
> const =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0 =A01 | Using index =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
> | =A01 | SIMPLE =A0 =A0 =A0| articles =A0| ref =A0 |
> index_articles_on_newsletter_id,index_articles_on_newsletter _id_and_creat=
ed_at
> | index_articles_on_newsletter_id_and_created_at | 4 =A0 =A0 =A0 |
> my_db.newsletters.id | =A0 =A03 | Using where =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
>
> +----+-------------+----------+-------+--------------------- -------------=
--------------------------------------+--------------------- ---------------=
--------+---------+---------------------+------+------------ ---------------=
-------------------+
> 3 rows in set (0.00 sec)
>
> This seems pretty decent, and does perform pretty well for some users (~0=
..5
> - 1 sec), but for some users (seemingly those with large numbers of
> articles) the query can take 20 - 30 seconds to run. =A0This seems really=
slow
> to me. =A0I tried adding in the index
> 'index_articles_on_newsletter_id_and_created_at' but the performance does=
n't
> seem to be any different then when it uses just the
> 'index_articles_on_newsletter_id' index. =A0I think this might be because=
of
> the functions I am using on the created_at column to get the day and mont=
h
> from it, making an index on created_at useless in this instance.
>
> Running both an 'optimize table entries' and 'analyze table entries' also
> didn't seem to have any real impact on the performance.
>
> I was wondering if anybody had any suggestions for what else I might be a=
ble
> to try, or if there is a better way to search on dates in this manner. =
=A0Any
> ideas would be greatly appreciated.
>
> Thanks,
>
> Simon
>



--=20
=A0- michael dykman
=A0- mdykman@gmail.com

=A0May the Source be with you.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: optimizing query

am 19.01.2011 02:11:40 von Mihail Manolov

I concur. In addition to suggested index I would add a new column in articl=
es table called body_length, which is going to be updated every time the bo=
dy column is updated. Add that column to the composite index mentioned belo=
w.

This should speed up the query a lot.

Cheers,
Mihail

On Jan 18, 2011, at 13:03, "Michael Dykman" wrote:

> Hi Simon,
>=20
> once you apply functions to a field, an index on that field is pretty
> much useless. For this particular query, I would be tempted to create
> additional fields to store the values of MONTH(articles.created_at)
> and DAY(articles.created_at). Create an index on (month_created,
> day_created) You could just sort by articles.created_at; no need for
> the YEAR function, the result will be the same given your other
> selectors.
>=20
> Given those adjustments, the query looks righteous enough.
>=20
> - michael dykman
>=20
> On Tue, Jan 18, 2011 at 12:22 PM, Simon Wilkinson
> wrote:
>> Hi,
>>=20
>> I am trying to optimize the following query:
>>=20
>> SELECT articles.* FROM articles INNER JOIN newsletters ON
>> articles.newsletter_id =3D newsletters.id INNER JOIN users ON users.id =
=3D
>> newsletters.user_id WHERE users.id =3D12 AND MONTH(articles.created_at) =
=3D '12'
>> AND DAY(articles.created_at) =3D '5' ORDER BY YEAR(articles.created_at),
>> LENGTH(articles.body);
>>=20
>> I am trying to retrieve all the articles created on a specific day of a
>> specific month that belong to a user, ordered by the oldest and then lon=
gest
>> article.
>>=20
>> I have run explain on the query, and get the following:
>>=20
>> +----+-------------+----------+-------+--------------------- ------------=
---------------------------------------+-------------------- ---------------=
---------+---------+---------------------+------+----------- ---------------=
--------------------+
>> | id | select_type | table | type |
>> possible_keys |
>> key | key_len | ref =
|
>> rows | Extra |
>> +----+-------------+----------+-------+--------------------- ------------=
---------------------------------------+-------------------- ---------------=
---------+---------+---------------------+------+----------- ---------------=
--------------------+
>> | 1 | SIMPLE | users | const |
>> PRIMARY |
>> PRIMARY | 4 | const
>> | 1 | Using index; Using temporary; Using filesort |
>> | 1 | SIMPLE | newsletters | ref |
>> PRIMARY,index_newsletters_on_user_id =
|
>> index_newsletters_on_user_id | 4 |
>> const | 1 | Using index =
|
>> | 1 | SIMPLE | articles | ref |
>> index_articles_on_newsletter_id,index_articles_on_newsletter _id_and_crea=
ted_at
>> | index_articles_on_newsletter_id_and_created_at | 4 |
>> my_db.newsletters.id | 3 | Using where =
|
>>=20
>> +----+-------------+----------+-------+--------------------- ------------=
---------------------------------------+-------------------- ---------------=
---------+---------+---------------------+------+----------- ---------------=
--------------------+
>> 3 rows in set (0.00 sec)
>>=20
>> This seems pretty decent, and does perform pretty well for some users (~=
0.5
>> - 1 sec), but for some users (seemingly those with large numbers of
>> articles) the query can take 20 - 30 seconds to run. This seems really =
slow
>> to me. I tried adding in the index
>> 'index_articles_on_newsletter_id_and_created_at' but the performance doe=
sn't
>> seem to be any different then when it uses just the
>> 'index_articles_on_newsletter_id' index. I think this might be because =
of
>> the functions I am using on the created_at column to get the day and mon=
th
>> from it, making an index on created_at useless in this instance.
>>=20
>> Running both an 'optimize table entries' and 'analyze table entries' als=
o
>> didn't seem to have any real impact on the performance.
>>=20
>> I was wondering if anybody had any suggestions for what else I might be =
able
>> to try, or if there is a better way to search on dates in this manner. =
Any
>> ideas would be greatly appreciated.
>>=20
>> Thanks,
>>=20
>> Simon
>>=20
>=20
>=20
>=20
> --=20
> - michael dykman
> - mdykman@gmail.com
>=20
> May the Source be with you.
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmmanolov@liquidat=
ion.com
>=20

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: optimizing query

am 19.01.2011 08:11:20 von Steve Meyers

On 1/18/11 10:22 AM, Simon Wilkinson wrote:
> SELECT articles.* FROM articles INNER JOIN newsletters ON
> articles.newsletter_id = newsletters.id INNER JOIN users ON users.id =
> newsletters.user_id WHERE users.id =12 AND MONTH(articles.created_at) = '12'
> AND DAY(articles.created_at) = '5' ORDER BY YEAR(articles.created_at),
> LENGTH(articles.body);

Simon -

There are a few issues that are slowing down your query. First, you're
running functions to calculate the month and day of each article that is
looked at. As an aside, are you sure you don't want the DAYOFMONTH()
function?

Second, it's ideal to have the where clause in your query filter down
(using an index) to as few rows as possible of the first table. Other
tables you join should ideally be 1 to 1 from the first table. To
accomplish this, you would probably need the user_id in your articles table.

Another aside -- I noticed you have index_articles_on_newsletter_id as
well as index_articles_on_newsletter_id_and_created_at. The first index
is redundant, the second index will take care of it. This will slow
down your INSERT/UPDATE/DELETE queries to some degree.

Steve

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: optimizing query

am 21.01.2011 16:42:22 von Simon Wilkinson

--0016e6d77d9b9e109c049a5d1879
Content-Type: text/plain; charset=ISO-8859-1

Thanks for the suggestions everybody.

I added in columns to store the day, month and year of the created_at value,
and then added in an index on (newsletter_id, created_month, created_day),
and the the slow queries reduced from around 20 seconds to 0.5 seconds! I
also removed the redundant indexes.

Cheers,

Simon

On 19 January 2011 02:11, Steve Meyers wrote:

> On 1/18/11 10:22 AM, Simon Wilkinson wrote:
>
>> SELECT articles.* FROM articles INNER JOIN newsletters ON
>> articles.newsletter_id = newsletters.id INNER JOIN users ON users.id =
>> newsletters.user_id WHERE users.id =12 AND MONTH(articles.created_at) =
>> '12'
>> AND DAY(articles.created_at) = '5' ORDER BY YEAR(articles.created_at),
>> LENGTH(articles.body);
>>
>
> Simon -
>
> There are a few issues that are slowing down your query. First, you're
> running functions to calculate the month and day of each article that is
> looked at. As an aside, are you sure you don't want the DAYOFMONTH()
> function?
>
> Second, it's ideal to have the where clause in your query filter down
> (using an index) to as few rows as possible of the first table. Other
> tables you join should ideally be 1 to 1 from the first table. To
> accomplish this, you would probably need the user_id in your articles table.
>
> Another aside -- I noticed you have index_articles_on_newsletter_id as well
> as index_articles_on_newsletter_id_and_created_at. The first index is
> redundant, the second index will take care of it. This will slow down your
> INSERT/UPDATE/DELETE queries to some degree.
>
> Steve
>

--0016e6d77d9b9e109c049a5d1879--