Performance of MySQL IN() given a list of constants.

Performance of MySQL IN() given a list of constants.

am 15.01.2010 18:15:43 von kabel

I'm facing a refactor of some slow code that engages the DB pretty heavily.
We're looking for a way to load arbitrary sets of rows from a table using an
indexed column of decently high cardinality, and I'm not sure if IN() is a
good way to do it or not.

Basically, we're looking at this:

CREATE TABLE `foo` (
`foo_id` INT UNSIGNED NOT NULL AUTO_INCREMENT
, `bar_id` INT UNSIGNED NOT NULL
, `unimportant_1` VARCHAR(8)
, `unimportant_2` VARCHAR(8)
, `unimportant_3` VARCHAR(8)
, PRIMARY KEY( `foo_id` )
, KEY `idx_bar_id` ( `bar_id` )
);

I need to load an arbitrary list of foos by their bar ID (1:N), and I'm not
sure the best way to do it. Joining against the `bar` table is,
unfortunately, not really helpful since the set loaded is totally arbitrary.

As far as I can tell, the constant list would be ~100 elements at most, but
take that with a grain of salt.

Right now, I'm considering:

SELECT *
FROM `foo`
WHERE `bar_id` IN( 1,2,3,4,5,6,7, ... ,61,62 );

I've done some testing and the performance doesn't seem to be too bad, but I'm
worried about scalability.

The query execution isn't too bad.

| 1 | SIMPLE | foo | range | idx_bar_id | idx_bar_id | 5 | NULL
| 7 | Using where |

Does anyone have experience with the performance of IN(), given a list of
constants, on an indexed column? Any pointers to more information on this
would be greatly appreciated.

Thanks in advance!

kabel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Performance of MySQL IN() given a list of constants.

am 15.01.2010 19:55:18 von Tom Worster

the example you gave would work with a range constraint:

WHERE `bar_id` > 0 AND `bar_id` < 63

but i guess this is not a general solution.

i've done exactly this kind of select using an IN constraint very often.
i've not had any trouble with lists of a few hundred so long as i have the
necessary index.

however, when the list gets really long i've divided it into multiple
selects in the app. i've nothing concrete to go on but a very long list of
constants makes me nervous. can you break something if query length exceeds
packet size?

one thought i had was to create another table `baz` just to store the list
of `bar_id` values. the app would: 1) truncate `baz`, 2) write the list
`bar_ids` into `baz`, 3) select from `foo` joining `baz`. perhaps not a high
performance solution but you could make it scale all the way up.


On 1/15/10 12:15 PM, "kabel" wrote:

> I'm facing a refactor of some slow code that engages the DB pretty heavily.
> We're looking for a way to load arbitrary sets of rows from a table using an
> indexed column of decently high cardinality, and I'm not sure if IN() is a
> good way to do it or not.
>
> Basically, we're looking at this:
>
> CREATE TABLE `foo` (
> `foo_id` INT UNSIGNED NOT NULL AUTO_INCREMENT
> , `bar_id` INT UNSIGNED NOT NULL
> , `unimportant_1` VARCHAR(8)
> , `unimportant_2` VARCHAR(8)
> , `unimportant_3` VARCHAR(8)
> , PRIMARY KEY( `foo_id` )
> , KEY `idx_bar_id` ( `bar_id` )
> );
>
> I need to load an arbitrary list of foos by their bar ID (1:N), and I'm not
> sure the best way to do it. Joining against the `bar` table is,
> unfortunately, not really helpful since the set loaded is totally arbitrary.
>
> As far as I can tell, the constant list would be ~100 elements at most, but
> take that with a grain of salt.
>
> Right now, I'm considering:
>
> SELECT *
> FROM `foo`
> WHERE `bar_id` IN( 1,2,3,4,5,6,7, ... ,61,62 );
>
> I've done some testing and the performance doesn't seem to be too bad, but I'm
> worried about scalability.
>
> The query execution isn't too bad.
>
> | 1 | SIMPLE | foo | range | idx_bar_id | idx_bar_id | 5 | NULL
> | 7 | Using where |
>
> Does anyone have experience with the performance of IN(), given a list of
> constants, on an indexed column? Any pointers to more information on this
> would be greatly appreciated.
>
> Thanks in advance!
>
> kabel



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Performance of MySQL IN() given a list of constants.

am 15.01.2010 20:03:51 von kabel

On Friday 15 January 2010 13:55:18 fsb wrote:
> the example you gave would work with a range constraint:
>
> WHERE `bar_id` > 0 AND `bar_id` < 63
>
> but i guess this is not a general solution.
>
> i've done exactly this kind of select using an IN constraint very often.
> i've not had any trouble with lists of a few hundred so long as i have the
> necessary index.

That's a relief to hear. I know my mileage may vary (MMMV?) but it's good to
hear someone else has done it.

> however, when the list gets really long i've divided it into multiple
> selects in the app. i've nothing concrete to go on but a very long list of
> constants makes me nervous. can you break something if query length exceeds
> packet size?

Yeah, absolutely. I didn't think about packet size, more about performance,
but that's a great point. Breaking it into smaller chunks is very workable.
You probably just saved me from receiving a panicky midnight phone call. ;-)

> one thought i had was to create another table `baz` just to store the list
> of `bar_id` values. the app would: 1) truncate `baz`, 2) write the list
> `bar_ids` into `baz`, 3) select from `foo` joining `baz`. perhaps not a
> high performance solution but you could make it scale all the way up.

That's not a bad way to go, and I like it a lot better than the IN() thing,
but this is almost guaranteed to be run several times concurrently and, since
it may involve multiple connections (please don't ask) I don't think temp
tables would work. I could take it with CREATE TABLE `baz_$SESSION_ID`...
etc., where I generate some session ID, but I may go with the IN() stuff for
starters just to see.

Thank you VERY much for your help!

kabel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org