Bad performance caused by DISTINCT?

Bad performance caused by DISTINCT?

am 13.11.2006 21:58:12 von Heist

This is the query:
SELECT DAY(date_time) "day", COUNT( DISTINCT session_id ) "count"
FROM application_requests
WHERE (retailer_id = '111111-1111-11111')
AND(year_for_reports = 2006)
AND(month_for_reports = 9)
GROUP BY DAY( date_time );

EXPLAIN returns this:
"1" | "SIMPLE" | "application_requests" | "ref" | "index_for_reports" |
"index_for_reports" | "53" | "const,const,const" | "998840" | "Using
where; Using filesort"

index_for_reports is (retailer_id, year_for_reports, month_for_reports)

Table structure goes as follow:
id(int)
retailer_id(varchar[40])
date_time(datetime)
session_id(varchar[255])
controller(varchar[255])
action(varchar[255])
parameters(text)
product_line_id(int)
merchandise_id(int)
month_for_reports(int)
year_for_reports(int)

Any idea how I could make this faster? Right now I'm clocking at 555703
ms, which is intolerable.

Re: Bad performance caused by DISTINCT?

am 14.11.2006 14:41:54 von zac.carey

Heist wrote:

> This is the query:
> SELECT DAY(date_time) "day", COUNT( DISTINCT session_id ) "count"
> FROM application_requests
> WHERE (retailer_id = '111111-1111-11111')
> AND(year_for_reports = 2006)
> AND(month_for_reports = 9)
> GROUP BY DAY( date_time );
>
> EXPLAIN returns this:
> "1" | "SIMPLE" | "application_requests" | "ref" | "index_for_reports" |
> "index_for_reports" | "53" | "const,const,const" | "998840" | "Using
> where; Using filesort"
>
> index_for_reports is (retailer_id, year_for_reports, month_for_reports)
>
> Table structure goes as follow:
> id(int)
> retailer_id(varchar[40])
> date_time(datetime)
> session_id(varchar[255])
> controller(varchar[255])
> action(varchar[255])
> parameters(text)
> product_line_id(int)
> merchandise_id(int)
> month_for_reports(int)
> year_for_reports(int)
>
> Any idea how I could make this faster? Right now I'm clocking at 555703
> ms, which is intolerable.

A couple of things spring to mind, although i don't know what, if
anything, any of these will actually do for performance:

Could retailer ID be stored as an integer? if you needed the hyphens
for display purposes you could use something like this:

SELECT INSERT (

INSERT (
retailer_id,
7,
0,
'-'
),
12,
0,
'-'
)

varchar 255 for session_id sounds very large. Could it be smaller?

month_for_reports is only ever going to be 1-12 so it might as well be
tinyint

other than that, make sure everything is properly indexed - but I guess
you've done that already?
HIH

Re: Bad performance caused by DISTINCT?

am 15.11.2006 15:41:42 von Heist

retailer_id cannot be stored as an integer sadly, but session_id could
be smaller. And yes, indexes are seemingly optimal. Thanks for the
answer.

strawberry wrote:
> Heist wrote:
>
> > This is the query:
> > SELECT DAY(date_time) "day", COUNT( DISTINCT session_id ) "count"
> > FROM application_requests
> > WHERE (retailer_id = '111111-1111-11111')
> > AND(year_for_reports = 2006)
> > AND(month_for_reports = 9)
> > GROUP BY DAY( date_time );
> >
> > EXPLAIN returns this:
> > "1" | "SIMPLE" | "application_requests" | "ref" | "index_for_reports" |
> > "index_for_reports" | "53" | "const,const,const" | "998840" | "Using
> > where; Using filesort"
> >
> > index_for_reports is (retailer_id, year_for_reports, month_for_reports)
> >
> > Table structure goes as follow:
> > id(int)
> > retailer_id(varchar[40])
> > date_time(datetime)
> > session_id(varchar[255])
> > controller(varchar[255])
> > action(varchar[255])
> > parameters(text)
> > product_line_id(int)
> > merchandise_id(int)
> > month_for_reports(int)
> > year_for_reports(int)
> >
> > Any idea how I could make this faster? Right now I'm clocking at 555703
> > ms, which is intolerable.
>
> A couple of things spring to mind, although i don't know what, if
> anything, any of these will actually do for performance:
>
> Could retailer ID be stored as an integer? if you needed the hyphens
> for display purposes you could use something like this:
>
> SELECT INSERT (
>
> INSERT (
> retailer_id,
> 7,
> 0,
> '-'
> ),
> 12,
> 0,
> '-'
> )
>
> varchar 255 for session_id sounds very large. Could it be smaller?
>
> month_for_reports is only ever going to be 1-12 so it might as well be
> tinyint
>
> other than that, make sure everything is properly indexed - but I guess
> you've done that already?
> HIH

Re: Bad performance caused by DISTINCT?

am 15.11.2006 16:35:10 von zac.carey

Heist wrote:

> retailer_id cannot be stored as an integer sadly, but session_id could
> be smaller. And yes, indexes are seemingly optimal. Thanks for the
> answer.

You're welcome. Sorry it wasn't much help.