Howto optimize Distinct query over 1.3mil rows?

Howto optimize Distinct query over 1.3mil rows?

am 28.09.2010 17:02:34 von John Stoffel

Hi,

I'm running MySQL 5.0.51a-24+lenny3-log on a Debian Lenny box with
2Gb of RAM and a pair of dual core 2.6Ghz CPUs. I'm using a pair of
40Gb disks mirrored using MD (Linux software RAID) for both the OS and
the storage of the mysql DBs.

My problem child is doing this simple query:

mysql> select distinct Call_No from Newspaper_Issues
mysql> WHERE BIB_ID = 464;
+----------+
| Call_No |
+----------+
| News |
| NewsD CT |
+----------+
2 rows in set (2.98 sec)

The Newspaper_Issues table has 1.3 million rows, and has a structure
like this:

mysql> describe Newspaper_Issues;
+----------------+-------------+------+-----+---------+----- -----------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+----- -----------+
| Record_No | int(11) | NO | PRI | NULL | auto_increment |
| BIB_ID | varchar(38) | NO | MUL | NULL | |
| Issue_Date | date | NO | MUL | NULL | |
| Type_Code | char(1) | NO | | r | |
| Condition_Code | char(1) | NO | | o | |
| Format_Code | char(1) | NO | | p | |
| Edition_Code | char(1) | NO | | n | |
| Date_Type_Code | char(1) | NO | | n | |
| Ed_Type | tinyint(1) | NO | | 1 | |
| RS_Code | char(1) | NO | | c | |
| Call_No | varchar(36) | YES | MUL | NULL | |
| Printed_Date | varchar(10) | YES | | NULL | |
| Update_Date | date | NO | | NULL | |
+----------------+-------------+------+-----+---------+----- -----------+
13 rows in set (0.00 sec)


I've tried adding various indexes, and reading up on howto optimize
DISTINCT or GROUP BY queries, but I'm hitting a wall here. My current indexes are:

mysql> show index from Newspaper_Issues;
+------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
| Newspaper_Issues | 0 | PRIMARY | 1 | Record_No | A | 1333298 | NULL | NULL | | BTREE | |
| Newspaper_Issues | 1 | BIB_ID | 1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | |
| Newspaper_Issues | 1 | Call_No | 1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | |
| Newspaper_Issues | 1 | Issue_Date | 1 | Issue_Date | A | 49381 | NULL | NULL | | BTREE | |
| Newspaper_Issues | 1 | BIB_ID_Issue_Date | 1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | |
| Newspaper_Issues | 1 | BIB_ID_Issue_Date | 2 | Issue_Date | A | 1333298 | NULL | NULL | | BTREE | |
| Newspaper_Issues | 1 | call_no_short | 1 | Call_No | A | 30 | 6 | NULL | YES | BTREE | |
| Newspaper_Issues | 1 | BIB_ID_call_no_short | 1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | |
| Newspaper_Issues | 1 | BIB_ID_call_no_short | 2 | Call_No | A | 15503 | NULL | NULL | YES | BTREE | |
| Newspaper_Issues | 1 | call_no_bib_id | 1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | |
| Newspaper_Issues | 1 | call_no_bib_id | 2 | BIB_ID | A | 15503 | NULL | NULL | | BTREE | |
+------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
11 rows in set (0.00 sec)


So now when I do an explain on my query I get:

mysql> explain select distinct(Call_No) from Newspaper_Issues WHERE BIB_ID = 464;
+----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
| 1 | SIMPLE | Newspaper_Issues | index | BIB_ID,BIB_ID_Issue_Date,BIB_ID_call_no_short | call_no_bib_id | 227 | NULL | 1333298 | Using where; Using index |
+----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
1 row in set (0.00 sec)

Interestingly enough, I get much better performance if I just drop the WHERE clause, but that doesn't help me get what I want though. *grin*

mysql> explain select distinct(Call_No) from Newspaper_Issues;
+----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
| 1 | SIMPLE | Newspaper_Issues | range | NULL | Call_No | 111 | NULL | 928 | Using index for group-by |
+----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
1 row in set (0.00 sec)



Would it make sense to split the Call_No data off into it's own table, and put in a proper numeric ID, instead of the current VARCHAR(36) it uses? So in Newspaper_Issues I'd just have a Call_No_ID and a new Call_No table with:

Call_No_ID INT
Call_No Char(36)

which would hopefully index better? I only have 928 distinct Call_No strings, so I'm not afraid of wasting space or anything.

Thanks,
John

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Howto optimize Distinct query over 1.3mil rows?

am 28.09.2010 17:48:40 von Johnny Withers

--0016363b9bf66bccfd049153c7c1
Content-Type: text/plain; charset=ISO-8859-1

BIB_ID is VARCHAR, you should probably try

WHERE BIB_ID='464' so MySQL treats the value as a string

JW


On Tue, Sep 28, 2010 at 10:02 AM, John Stoffel wrote:

>
> Hi,
>
> I'm running MySQL 5.0.51a-24+lenny3-log on a Debian Lenny box with
> 2Gb of RAM and a pair of dual core 2.6Ghz CPUs. I'm using a pair of
> 40Gb disks mirrored using MD (Linux software RAID) for both the OS and
> the storage of the mysql DBs.
>
> My problem child is doing this simple query:
>
> mysql> select distinct Call_No from Newspaper_Issues
> mysql> WHERE BIB_ID = 464;
> +----------+
> | Call_No |
> +----------+
> | News |
> | NewsD CT |
> +----------+
> 2 rows in set (2.98 sec)
>
> The Newspaper_Issues table has 1.3 million rows, and has a structure
> like this:
>
> mysql> describe Newspaper_Issues;
> +----------------+-------------+------+-----+---------+----- -----------+
> | Field | Type | Null | Key | Default | Extra |
> +----------------+-------------+------+-----+---------+----- -----------+
> | Record_No | int(11) | NO | PRI | NULL | auto_increment |
> | BIB_ID | varchar(38) | NO | MUL | NULL | |
> | Issue_Date | date | NO | MUL | NULL | |
> | Type_Code | char(1) | NO | | r | |
> | Condition_Code | char(1) | NO | | o | |
> | Format_Code | char(1) | NO | | p | |
> | Edition_Code | char(1) | NO | | n | |
> | Date_Type_Code | char(1) | NO | | n | |
> | Ed_Type | tinyint(1) | NO | | 1 | |
> | RS_Code | char(1) | NO | | c | |
> | Call_No | varchar(36) | YES | MUL | NULL | |
> | Printed_Date | varchar(10) | YES | | NULL | |
> | Update_Date | date | NO | | NULL | |
> +----------------+-------------+------+-----+---------+----- -----------+
> 13 rows in set (0.00 sec)
>
>
> I've tried adding various indexes, and reading up on howto optimize
> DISTINCT or GROUP BY queries, but I'm hitting a wall here. My current
> indexes are:
>
> mysql> show index from Newspaper_Issues;
>
> +------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
> | Table | Non_unique | Key_name | Seq_in_index |
> Column_name | Collation | Cardinality | Sub_part | Packed | Null |
> Index_type | Comment |
>
> +------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
> | Newspaper_Issues | 0 | PRIMARY | 1 |
> Record_No | A | 1333298 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID | 1 |
> BIB_ID | A | 14980 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | Call_No | 1 |
> Call_No | A | 927 | NULL | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | Issue_Date | 1 |
> Issue_Date | A | 49381 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_Issue_Date | 1 |
> BIB_ID | A | 14980 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_Issue_Date | 2 |
> Issue_Date | A | 1333298 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | call_no_short | 1 |
> Call_No | A | 30 | 6 | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_call_no_short | 1 |
> BIB_ID | A | 14980 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_call_no_short | 2 |
> Call_No | A | 15503 | NULL | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | call_no_bib_id | 1 |
> Call_No | A | 927 | NULL | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | call_no_bib_id | 2 |
> BIB_ID | A | 15503 | NULL | NULL | | BTREE
> | |
>
> +------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
> 11 rows in set (0.00 sec)
>
>
> So now when I do an explain on my query I get:
>
> mysql> explain select distinct(Call_No) from Newspaper_Issues WHERE
> BIB_ID = 464;
>
> +----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra
> |
>
> +----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
> | 1 | SIMPLE | Newspaper_Issues | index |
> BIB_ID,BIB_ID_Issue_Date,BIB_ID_call_no_short | call_no_bib_id | 227 |
> NULL | 1333298 | Using where; Using index |
>
> +----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
> 1 row in set (0.00 sec)
>
> Interestingly enough, I get much better performance if I just drop the
> WHERE clause, but that doesn't help me get what I want though. *grin*
>
> mysql> explain select distinct(Call_No) from Newspaper_Issues;
>
> +----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
> | id | select_type | table | type | possible_keys | key
> | key_len | ref | rows | Extra |
>
> +----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
> | 1 | SIMPLE | Newspaper_Issues | range | NULL | Call_No
> | 111 | NULL | 928 | Using index for group-by |
>
> +----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
> 1 row in set (0.00 sec)
>
>
>
> Would it make sense to split the Call_No data off into it's own table, and
> put in a proper numeric ID, instead of the current VARCHAR(36) it uses? So
> in Newspaper_Issues I'd just have a Call_No_ID and a new Call_No table with:
>
> Call_No_ID INT
> Call_No Char(36)
>
> which would hopefully index better? I only have 928 distinct Call_No
> strings, so I'm not afraid of wasting space or anything.
>
> Thanks,
> John
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=johnny@pixelated.net
>
>


--
-----------------------------
Johnny Withers
601.209.4985
johnny@pixelated.net

--0016363b9bf66bccfd049153c7c1--

Re: Howto optimize Distinct query over 1.3mil rows?

am 28.09.2010 17:50:50 von Johan De Meersman

--0022152d622d2bcd5d049153cfae
Content-Type: text/plain; charset=ISO-8859-1

If Cal_NO is a recurring value, then yes, that is the way it should be done
in a relational schema.

Your index cardinality of 15.000 against 1.3 million rows is reasonable,
although not incredible; is your index cache large enough to acccomodate all
your indices ?


On Tue, Sep 28, 2010 at 5:02 PM, John Stoffel wrote:

>
> Hi,
>
> I'm running MySQL 5.0.51a-24+lenny3-log on a Debian Lenny box with
> 2Gb of RAM and a pair of dual core 2.6Ghz CPUs. I'm using a pair of
> 40Gb disks mirrored using MD (Linux software RAID) for both the OS and
> the storage of the mysql DBs.
>
> My problem child is doing this simple query:
>
> mysql> select distinct Call_No from Newspaper_Issues
> mysql> WHERE BIB_ID = 464;
> +----------+
> | Call_No |
> +----------+
> | News |
> | NewsD CT |
> +----------+
> 2 rows in set (2.98 sec)
>
> The Newspaper_Issues table has 1.3 million rows, and has a structure
> like this:
>
> mysql> describe Newspaper_Issues;
> +----------------+-------------+------+-----+---------+----- -----------+
> | Field | Type | Null | Key | Default | Extra |
> +----------------+-------------+------+-----+---------+----- -----------+
> | Record_No | int(11) | NO | PRI | NULL | auto_increment |
> | BIB_ID | varchar(38) | NO | MUL | NULL | |
> | Issue_Date | date | NO | MUL | NULL | |
> | Type_Code | char(1) | NO | | r | |
> | Condition_Code | char(1) | NO | | o | |
> | Format_Code | char(1) | NO | | p | |
> | Edition_Code | char(1) | NO | | n | |
> | Date_Type_Code | char(1) | NO | | n | |
> | Ed_Type | tinyint(1) | NO | | 1 | |
> | RS_Code | char(1) | NO | | c | |
> | Call_No | varchar(36) | YES | MUL | NULL | |
> | Printed_Date | varchar(10) | YES | | NULL | |
> | Update_Date | date | NO | | NULL | |
> +----------------+-------------+------+-----+---------+----- -----------+
> 13 rows in set (0.00 sec)
>
>
> I've tried adding various indexes, and reading up on howto optimize
> DISTINCT or GROUP BY queries, but I'm hitting a wall here. My current
> indexes are:
>
> mysql> show index from Newspaper_Issues;
>
> +------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
> | Table | Non_unique | Key_name | Seq_in_index |
> Column_name | Collation | Cardinality | Sub_part | Packed | Null |
> Index_type | Comment |
>
> +------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
> | Newspaper_Issues | 0 | PRIMARY | 1 |
> Record_No | A | 1333298 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID | 1 |
> BIB_ID | A | 14980 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | Call_No | 1 |
> Call_No | A | 927 | NULL | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | Issue_Date | 1 |
> Issue_Date | A | 49381 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_Issue_Date | 1 |
> BIB_ID | A | 14980 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_Issue_Date | 2 |
> Issue_Date | A | 1333298 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | call_no_short | 1 |
> Call_No | A | 30 | 6 | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_call_no_short | 1 |
> BIB_ID | A | 14980 | NULL | NULL | | BTREE
> | |
> | Newspaper_Issues | 1 | BIB_ID_call_no_short | 2 |
> Call_No | A | 15503 | NULL | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | call_no_bib_id | 1 |
> Call_No | A | 927 | NULL | NULL | YES | BTREE
> | |
> | Newspaper_Issues | 1 | call_no_bib_id | 2 |
> BIB_ID | A | 15503 | NULL | NULL | | BTREE
> | |
>
> +------------------+------------+----------------------+---- ----------+-------------+-----------+-------------+--------- -+--------+------+------------+---------+
> 11 rows in set (0.00 sec)
>
>
> So now when I do an explain on my query I get:
>
> mysql> explain select distinct(Call_No) from Newspaper_Issues WHERE
> BIB_ID = 464;
>
> +----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra
> |
>
> +----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
> | 1 | SIMPLE | Newspaper_Issues | index |
> BIB_ID,BIB_ID_Issue_Date,BIB_ID_call_no_short | call_no_bib_id | 227 |
> NULL | 1333298 | Using where; Using index |
>
> +----+-------------+------------------+-------+------------- ----------------------------------+----------------+-------- -+------+---------+--------------------------+
> 1 row in set (0.00 sec)
>
> Interestingly enough, I get much better performance if I just drop the
> WHERE clause, but that doesn't help me get what I want though. *grin*
>
> mysql> explain select distinct(Call_No) from Newspaper_Issues;
>
> +----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
> | id | select_type | table | type | possible_keys | key
> | key_len | ref | rows | Extra |
>
> +----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
> | 1 | SIMPLE | Newspaper_Issues | range | NULL | Call_No
> | 111 | NULL | 928 | Using index for group-by |
>
> +----+-------------+------------------+-------+------------- --+---------+---------+------+------+----------------------- ---+
> 1 row in set (0.00 sec)
>
>
>
> Would it make sense to split the Call_No data off into it's own table, and
> put in a proper numeric ID, instead of the current VARCHAR(36) it uses? So
> in Newspaper_Issues I'd just have a Call_No_ID and a new Call_No table with:
>
> Call_No_ID INT
> Call_No Char(36)
>
> which would hopefully index better? I only have 928 distinct Call_No
> strings, so I'm not afraid of wasting space or anything.
>
> Thanks,
> John
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=vegivamp@tuxera.be
>
>


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0022152d622d2bcd5d049153cfae--

Re: Howto optimize Distinct query over 1.3mil rows?

am 28.09.2010 19:06:23 von John Stoffel

Johnny> BIB_ID is VARCHAR, you should probably try
Johnny> WHERE BIB_ID='464' so MySQL treats the value as a string

Wow! What a difference that makes! Time to A) update my queries, or
B) fix the DB schema to NOT use varchar there.

mysql> select SQL_NO_CACHE distinct(Call_No) from Newspaper_Issues
mysql> WHERE BIB_ID = 464;
+----------+
| Call_No |
+----------+
| News |
| NewsD CT |
+----------+
2 rows in set (3.06 sec)

mysql> select SQL_NO_CACHE distinct(Call_No) from Newspaper_Issues
mysql> WHERE BIB_ID = '464';
+----------+
| Call_No |
+----------+
| News |
| NewsD CT |
+----------+
2 rows in set (0.02 sec)

Thanks a ton for your help, I would have never figured this out, esp
since I was looking down all the wrong rat holes.

Thanks,
John

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org