Optimization Suggestion: temporary files and filesort used with LIMIT

am 30.08.2004 22:53:33 von Lorderon

Hi...

There's a speed performance that bothers me when using temp files and
filesort used with LIMIT.
I will give an example query:

SELECT tbl_2.id,tbl_2.type,tbl_2.container,tbl_2.title,tbl_1.path
FROM tbl_1
INNER JOIN tbl_2 USING (id)
INNER JOIN tbl_3 USING (id)
WHERE (tbl_1.path LIKE '/1/2/%' AND tbl_3.is_active='YES')
ORDER BY tbl_2.container ASC,tbl_2.title ASC
LIMIT 100,20;

Lets assume there is no KEY on (container,title), so this query must use
filesort. Lets assume that the query returns 20,000 rows without limit, and
assume tbl_2 has 100,000 rows.
There would be 2 possibilities (that I thought of) to exec the query, and
which MySQL supports them:
1. First do the filesort algorithm on tbl_2, and then do the join select and
limit.
2. First create a temp file and put in it the join select result, and then
use filesort on the temp file and limit.
In option 1 it sorts a whole table of 100,000 rows (filesort takes a lot of
time on such big table).
In option 2 it uses a temp file of 20,000 rows and filesort on 20,000 rows
(temp file is big).
And all that to return just 20 rows in quite the beginning - It seems like a
waste.

My suggestion is this:
We will take the LIMIT consideration when creating the temp file in option
2. We will set the temp file row limit to %offset+%amount in LIMIT
%offset,%amount. We will run the join select and put only the matched rows
in the temp file, but with ORDER! We'll not use regular filesort. When we
find a row that matches, we will look for its right place in the temp file
(since the temp file is always sorted, it would be fast and easy. You can
use more techniques on finding the right place faster). If the right place
is beyond the temp file row limit (you always know how much rows you got in
the temp file and what is the last sort key), we can surely discard the row.
If the right place is between 1 to the temp file row limit, we will push it
in the right place in the temp file, then if the temp file exceeds the row
limit, we will throw the last row, keeping the temp file in the row limit
boundaries. When finish the select, then we retrieve the rows from the temp
file with the LIMIT.
Using this algorithm we reduce the size of the temp file, and avoid filesort
on large amount of rows. This would be more effective when the temp file row
limit is relatively smaller than the amount of rows matched without limit
(%offset and %amount are relatively small), especially in cases that option
2 (mentioned above) is using a large temp file row amount. When the temp
file row limit is closer to the temp file row amount in option 2, then using
option 2 with qsort filesort would be faster. I know it's not linear, but
you can find the approx boundary of {temp file row limit}/{temp file row
amount} percentage to choose the faster algorithm.

In the example I gave above:
temp file row amount = 20,000
temp file row limit = 120 (%offset+%amount = 100+20 = 120)

Any comments would be accepted gratefully...

-thanks, Lorderon.

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

RE: Optimization Suggestion: temporary files and filesort used with LIMIT

am 31.08.2004 22:28:32 von Igor Babaev

Hi,

Your suggestion to optimize queries with 'order by' and 'limit' is quite
reasonable.
In our TODO list we have a task to employ the priority queue instead of
the current sort procedure in these cases.
Hopefully this optimization will appear in the future 5.0 releases.

Regards,
Igor.

--

Igor Babaev, Senior Software Developer
MySQL AB, www.mysql.com

> ----- Forwarded message from Lorderon -----
>
> From: "Lorderon"
> To: bugs@lists.mysql.com
> Subject: Optimization Suggestion: temporary files and filesort used
with
> LIMIT
> Date: Mon, 30 Aug 2004 22:53:33 +0200
>
> Hi...
>
> There's a speed performance that bothers me when using temp files and
> filesort used with LIMIT.
> I will give an example query:
>
> SELECT tbl_2.id,tbl_2.type,tbl_2.container,tbl_2.title,tbl_1.path
> FROM tbl_1
> INNER JOIN tbl_2 USING (id)
> INNER JOIN tbl_3 USING (id)
> WHERE (tbl_1.path LIKE '/1/2/%' AND tbl_3.is_active='YES')
> ORDER BY tbl_2.container ASC,tbl_2.title ASC
> LIMIT 100,20;
>
> Lets assume there is no KEY on (container,title), so this query must
use
> filesort. Lets assume that the query returns 20,000 rows without
limit,
> and
> assume tbl_2 has 100,000 rows.
> There would be 2 possibilities (that I thought of) to exec the query,
and
> which MySQL supports them:
> 1. First do the filesort algorithm on tbl_2, and then do the join
select
> and
> limit.
> 2. First create a temp file and put in it the join select result, and
then
> use filesort on the temp file and limit.
> In option 1 it sorts a whole table of 100,000 rows (filesort takes a
lot
> of
> time on such big table).
> In option 2 it uses a temp file of 20,000 rows and filesort on 20,000
rows
> (temp file is big).
> And all that to return just 20 rows in quite the beginning - It seems
like
> a
> waste.
>
> My suggestion is this:
> We will take the LIMIT consideration when creating the temp file in
option
> 2. We will set the temp file row limit to %offset+%amount in LIMIT
> %offset,%amount. We will run the join select and put only the matched
rows
> in the temp file, but with ORDER! We'll not use regular filesort. When
we
> find a row that matches, we will look for its right place in the temp
file
> (since the temp file is always sorted, it would be fast and easy. You
can
> use more techniques on finding the right place faster). If the right
place
> is beyond the temp file row limit (you always know how much rows you
got
> in
> the temp file and what is the last sort key), we can surely discard
the
> row.
> If the right place is between 1 to the temp file row limit, we will
push
> it
> in the right place in the temp file, then if the temp file exceeds the
row
> limit, we will throw the last row, keeping the temp file in the row
limit
> boundaries. When finish the select, then we retrieve the rows from the
> temp
> file with the LIMIT.
> Using this algorithm we reduce the size of the temp file, and avoid
> filesort
> on large amount of rows. This would be more effective when the temp
file
> row
> limit is relatively smaller than the amount of rows matched without
limit
> (%offset and %amount are relatively small), especially in cases that
> option
> 2 (mentioned above) is using a large temp file row amount. When the
temp
> file row limit is closer to the temp file row amount in option 2, then
> using
> option 2 with qsort filesort would be faster. I know it's not linear,
but
> you can find the approx boundary of {temp file row limit}/{temp file
row
> amount} percentage to choose the faster algorithm.
>
> In the example I gave above:
> temp file row amount = 20,000
> temp file row limit = 120 (%offset+%amount = 100+20 = 120)
>
>
> Any comments would be accepted gratefully...
>
> -thanks, Lorderon.
>

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org