Query Optimization

Query Optimization

am 30.08.2011 18:45:53 von Brandon Phelps

Hello,

I have the following query I'd like to optimize a bit:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30 00:00:00')
AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30 12:36:53')
ORDER BY rcvd DESC
LIMIT 0 , 10

Currently I have an index on the rcvd column which seems to be working
based on the output of EXPLAIN:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sc index open_dt ndx_rcvd 4 NULL 10 Using where
1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1
1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1

However the query is still fairly slow for some reason, any ideas how I
could speed it up with additional indexes, etc?

The values I am using in the WHERE clause are variable and change each time.

The table has around 23 million records right now but is expected to
continue to grow up to a potential 150 million.

Here is the table schema:
CREATE TABLE IF NOT EXISTS `firewall_connections` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`open_dt` datetime NOT NULL,
`close_dt` datetime NOT NULL,
`protocol` smallint(6) NOT NULL,
`src_address` int(10) unsigned NOT NULL,
`src_port` smallint(5) unsigned NOT NULL,
`dst_address` int(10) unsigned NOT NULL,
`dst_port` smallint(5) unsigned NOT NULL,
`sent` int(10) unsigned NOT NULL,
`rcvd` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `ndx_rcvd` (`rcvd`),
KEY `ndx_sent` (`sent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Thanks in advance!

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 30.08.2011 19:23:54 von Michael Dykman

--90e6ba6135c6abcfb304abbc465b
Content-Type: text/plain; charset=ISO-8859-1

There are a few things gonig on, but mainly it is the ORs that are killing
you.

As your require OR to examine two distinct columns, both of equal relevance
to the query, MySQL: is left with no choice but to do a full table scan on
what might be (at a guess) a very larger table. No amount of indexing will
fix this for the query presented.

You would be better off writing it as two distinct queires, each concerned
with conditions on a single column (open_dt and close_dt) and then UNIONing
the results. In this form, the indexes have a chance of being engaged.

Once the indexes are engaged, you probably want to your DATE/DATETIME
strings into actual DATEs or DATATIMEs, thus:

...
(open_dt >= DATE('2011-08-30 00:00:00'))
...

In it's current state, the DATE fields are being converted to strings
implicitly for every row tested which further frustrates index usage as the
index is against the quasi-numeric DATE, not the string representation which
your current implementation appears to expect. This query would also
suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by
an admin or a future release of MySQL. The explicit casting I have
suggested will protect you against that,

- michael dykman`
On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps wrote:

> Hello,
>
> I have the following query I'd like to optimize a bit:
>
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA( sc.src_address ) AS src_address,
> sc.src_port,
> INET_NTOA( sc.dst_address ) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM firewall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
> WHERE
> (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
> 00:00:00')
> AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
> 12:36:53')
> ORDER BY rcvd DESC
> LIMIT 0 , 10
>
> Currently I have an index on the rcvd column which seems to be working
> based on the output of EXPLAIN:
>
> id select_type table type possible_keys key key_len
> ref rows Extra
> 1 SIMPLE sc index open_dt ndx_rcvd 4
> NULL 10 Using where
> 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2
> syslog.sc.src_port 1
> 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2
> syslog.sc.dst_port 1
>
> However the query is still fairly slow for some reason, any ideas how I
> could speed it up with additional indexes, etc?
>
> The values I am using in the WHERE clause are variable and change each
> time.
>
> The table has around 23 million records right now but is expected to
> continue to grow up to a potential 150 million.
>
> Here is the table schema:
> CREATE TABLE IF NOT EXISTS `firewall_connections` (
> `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
> `open_dt` datetime NOT NULL,
> `close_dt` datetime NOT NULL,
> `protocol` smallint(6) NOT NULL,
> `src_address` int(10) unsigned NOT NULL,
> `src_port` smallint(5) unsigned NOT NULL,
> `dst_address` int(10) unsigned NOT NULL,
> `dst_port` smallint(5) unsigned NOT NULL,
> `sent` int(10) unsigned NOT NULL,
> `rcvd` int(10) unsigned NOT NULL,
> PRIMARY KEY (`id`),
> KEY `ndx_rcvd` (`rcvd`),
> KEY `ndx_sent` (`sent`)
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>
> Thanks in advance!
>
> --
> Brandon
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?**unsub=mdykman@gmail.com
>
>


--
- michael dykman
- mdykman@gmail.com

May the Source be with you.

--90e6ba6135c6abcfb304abbc465b--

Re: Query Optimization

am 01.09.2011 10:59:40 von Jochem van Dieten

--001485f27b3a0d8a8a04abdd77da
Content-Type: text/plain; charset=UTF-8

On Aug 30, 2011 6:46 PM, "Brandon Phelps" wrote:
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA( sc.src_address ) AS src_address,
> sc.src_port,
> INET_NTOA( sc.dst_address ) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM firewall_connections AS sc

If this is a firewall connection log I presume open_dt is the time a
connection was opened and is always going to be less than close_dt. Right?

> WHERE
> (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
00:00:00')
> AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
12:36:53')

In that case your logic here simplifies to:
WHERE
open_dt >= '2011-08-30 00:00:00'
AND
close_dt <= '2011-08-30 12:36:53'

Now add an index over open_dt and close_dt and see what happens.

Jochem

--001485f27b3a0d8a8a04abdd77da--

Re: Query Optimization

am 01.09.2011 15:42:52 von Brandon Phelps

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
> > SELECT
> > sc.open_dt,
> > sc.close_dt,
> > sc.protocol,
> > INET_NTOA( sc.src_address ) AS src_address,
> > sc.src_port,
> > INET_NTOA( sc.dst_address ) AS dst_address,
> > sc.dst_port,
> > sc.sent,
> > sc.rcvd,
> > spm.desc AS src_port_desc,
> > dpm.desc AS dst_port_desc
> > FROM firewall_connections AS sc
>
> If this is a firewall connection log I presume open_dt is the time a
> connection was opened and is always going to be less than close_dt.
> Right?

> > WHERE
> > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
00:00:00')
> > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <=
'2011-08-30 12:36:53')

> In that case your logic here simplifies to:
> WHERE
> open_dt >= '2011-08-30 00:00:00'
> AND
> close_dt <= '2011-08-30 12:36:53'

> Now add an index over open_dt and close_dt and see what happens.

> Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?

Thanks again,
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 01.09.2011 18:47:21 von shawn.l.green

On 9/1/2011 09:42, Brandon Phelps wrote:
> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
> > > ...
>
> > > WHERE
> > > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
> 00:00:00')
> > > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
> 12:36:53')
>
> > In that case your logic here simplifies to:
> > WHERE
> > open_dt >= '2011-08-30 00:00:00'
> > AND
> > close_dt <= '2011-08-30 12:36:53'
>
> > Now add an index over open_dt and close_dt and see what happens.
>
> > Jochem
>
> Jochem,
>
> I can't really use your WHERE logic because I also need to retrieve
> results where the open_dt time is out of the range specified. For
> example, a very large file download might span multiple days so given
> your logic if the connection was started 2 days ago and I want to pull 1
> days worth of connections, I would miss that entry. Basically I want to
> SELECT all of the records that were opened OR closed during the
> specified time period, ie. if any activity happened between my start and
> end dates, I need to see that record.
>
> Any other ideas?
>

I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by |----|
markers showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) |--------------------|
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time
and (e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be
part of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window -
include this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start <= (ending time) and end >= (starting time)

Try that and let us know the results.
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 01.09.2011 19:32:09 von Brandon Phelps

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
> On 9/1/2011 09:42, Brandon Phelps wrote:
>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>> > > ...
>>
>> > > WHERE
>> > > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
>> 00:00:00')
>> > > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
>> 12:36:53')
>>
>> > In that case your logic here simplifies to:
>> > WHERE
>> > open_dt >= '2011-08-30 00:00:00'
>> > AND
>> > close_dt <= '2011-08-30 12:36:53'
>>
>> > Now add an index over open_dt and close_dt and see what happens.
>>
>> > Jochem
>>
>> Jochem,
>>
>> I can't really use your WHERE logic because I also need to retrieve
>> results where the open_dt time is out of the range specified. For
>> example, a very large file download might span multiple days so given
>> your logic if the connection was started 2 days ago and I want to pull 1
>> days worth of connections, I would miss that entry. Basically I want to
>> SELECT all of the records that were opened OR closed during the
>> specified time period, ie. if any activity happened between my start and
>> end dates, I need to see that record.
>>
>> Any other ideas?
>>
>
> I believe Jochem was on the right track but he got his dates reversed.
>
> Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by |----| markers showing their durations.
>
> a) (s) (e)
> b) |---|
> c) |---|
> d) |---|
> e) |--------------------|
> f) |---|
> g) |---|
>
> To describe these situations:
> a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in.
> b) the event starts and stops before your window exists. It won't be part of your results.
> c) the event starts before the window but ends within the window - include this
> d) the event starts and ends within the window - include this
> e) the event starts before the window and ends after the window - include this
> f) the event starts inside the window but ends beyond the window - include this.
> g) the event starts and ends beyond the window - exclude this.
>
> In order to get every event in the range of c-f, here is what you need for a WHERE clause
>
> WHERE start <= (ending time) and end >= (starting time)
>
> Try that and let us know the results.

Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt >= '2011-08-31 09:53:31' OR close_dt >= '2011-08-31 09:53:31')
AND (open_dt <= '2011-09-01 09:53:31' OR close_dt <= '2011-09-01 09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt <= '2011-09-01 09:53:31' AND close_dt >= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
+----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
+----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+

EXPLAIN output for new method with new index:
+----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
+----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+

SHOW INDEX:
+----------------------+------------+-------------------+--- -----------+-------------+-----------+-------------+-------- --+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------------+------------+-------------------+--- -----------+-------------+-----------+-------------+-------- --+--------+------+------------+---------+
| firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691 | NULL | NULL | | BTREE | |
| firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A | 6377783 | NULL | NULL | | BTREE | |
+----------------------+------------+-------------------+--- -----------+-------------+-----------+-------------+-------- --+--------+------+------------+---------+


Although right now the queries do seem to be executing much faster, although I'm not quite sure why. And I'm not sure why the new ndx_open_close_dt isn't being used either.

-Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 20:03:57 von Brandon Phelps

On 09/01/2011 01:32 PM, Brandon Phelps wrote:
> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>> > > ...
>>>
>>> > > WHERE
>>> > > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
>>> 00:00:00')
>>> > > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
>>> 12:36:53')
>>>
>>> > In that case your logic here simplifies to:
>>> > WHERE
>>> > open_dt >= '2011-08-30 00:00:00'
>>> > AND
>>> > close_dt <= '2011-08-30 12:36:53'
>>>
>>> > Now add an index over open_dt and close_dt and see what happens.
>>>
>>> > Jochem
>>>
>>> Jochem,
>>>
>>> I can't really use your WHERE logic because I also need to retrieve
>>> results where the open_dt time is out of the range specified. For
>>> example, a very large file download might span multiple days so given
>>> your logic if the connection was started 2 days ago and I want to pull 1
>>> days worth of connections, I would miss that entry. Basically I want to
>>> SELECT all of the records that were opened OR closed during the
>>> specified time period, ie. if any activity happened between my start and
>>> end dates, I need to see that record.
>>>
>>> Any other ideas?
>>>
>>
>> I believe Jochem was on the right track but he got his dates reversed.
>>
>> Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by |----| markers showing their durations.
>>
>> a) (s) (e)
>> b) |---|
>> c) |---|
>> d) |---|
>> e) |--------------------|
>> f) |---|
>> g) |---|
>>
>> To describe these situations:
>> a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in.
>> b) the event starts and stops before your window exists. It won't be part of your results.
>> c) the event starts before the window but ends within the window - include this
>> d) the event starts and ends within the window - include this
>> e) the event starts before the window and ends after the window - include this
>> f) the event starts inside the window but ends beyond the window - include this.
>> g) the event starts and ends beyond the window - exclude this.
>>
>> In order to get every event in the range of c-f, here is what you need for a WHERE clause
>>
>> WHERE start <= (ending time) and end >= (starting time)
>>
>> Try that and let us know the results.
>
> Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>
> Old method:
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM firewall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
> WHERE
> (open_dt >= '2011-08-31 09:53:31' OR close_dt >= '2011-08-31 09:53:31')
> AND (open_dt <= '2011-09-01 09:53:31' OR close_dt <= '2011-09-01 09:53:31')
> ORDER BY rcvd DESC
> LIMIT 0, 10;
>
> New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt):
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM firewall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
> WHERE
> open_dt <= '2011-09-01 09:53:31' AND close_dt >= '2011-08-31 09:53:31'
> ORDER BY rcvd DESC
> LIMIT 0, 10;
>
> EXPLAIN output for old method:
> +----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
> +----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where |
> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
> +----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
>
> EXPLAIN output for new method with new index:
> +----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
> +----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where |
> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
> +----+-------------+-------+--------+----------------------- ----+----------+---------+--------------------+------+------ -------+
>
> SHOW INDEX:
> +----------------------+------------+-------------------+--- -----------+-------------+-----------+-------------+-------- --+--------+------+------------+---------+
> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
> +----------------------+------------+-------------------+--- -----------+-------------+-----------+-------------+-------- --+--------+------+------------+---------+
> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691 | NULL | NULL | | BTREE | |
> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A | 6377783 | NULL | NULL | | BTREE | |
> +----------------------+------------+-------------------+--- -----------+-------------+-----------+-------------+-------- --+--------+------+------------+---------+
>
>
> Although right now the queries do seem to be executing much faster, although I'm not quite sure why. And I'm not sure why the new ndx_open_close_dt isn't being used either.
>
> -Brandon
>

I am still having a big issue with my query as seen above. The table is up to around 32 million records at the moment and either of the two SELECT queries above take a very long time to run. Is there anything at all I can do to speed things up? It seems that changing the format of the WHERE clause did not help at all, as the EXPLAIN output is exactly the same for both version. I also tried adding an index on (open_dt, close_dt, rcvd) but that index does not get used.

Any other ideas?

Thanks in advance,

Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 20:16:09 von Andrew Moore

--20cf3056396d18961804ac720ec9
Content-Type: text/plain; charset=ISO-8859-1

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:

> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>
>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>
>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>
>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>> > > ...
>>>>
>>>> > > WHERE
>>>> > > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
>>>> 00:00:00')
>>>> > > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
>>>> 12:36:53')
>>>>
>>>> > In that case your logic here simplifies to:
>>>> > WHERE
>>>> > open_dt >= '2011-08-30 00:00:00'
>>>> > AND
>>>> > close_dt <= '2011-08-30 12:36:53'
>>>>
>>>> > Now add an index over open_dt and close_dt and see what happens.
>>>>
>>>> > Jochem
>>>>
>>>> Jochem,
>>>>
>>>> I can't really use your WHERE logic because I also need to retrieve
>>>> results where the open_dt time is out of the range specified. For
>>>> example, a very large file download might span multiple days so given
>>>> your logic if the connection was started 2 days ago and I want to pull 1
>>>> days worth of connections, I would miss that entry. Basically I want to
>>>> SELECT all of the records that were opened OR closed during the
>>>> specified time period, ie. if any activity happened between my start and
>>>> end dates, I need to see that record.
>>>>
>>>> Any other ideas?
>>>>
>>>>
>>> I believe Jochem was on the right track but he got his dates reversed.
>>>
>>> Let's try a little ASCII art to show the situation. I will setup a query
>>> window with two markers (s) and (e). Events will be marked by |----| markers
>>> showing their durations.
>>>
>>> a) (s) (e)
>>> b) |---|
>>> c) |---|
>>> d) |---|
>>> e) |--------------------|
>>> f) |---|
>>> g) |---|
>>>
>>> To describe these situations:
>>> a) is the window for which you want to query (s) is the starting time and
>>> (e) is the ending time for the date range you are interested in.
>>> b) the event starts and stops before your window exists. It won't be part
>>> of your results.
>>> c) the event starts before the window but ends within the window -
>>> include this
>>> d) the event starts and ends within the window - include this
>>> e) the event starts before the window and ends after the window - include
>>> this
>>> f) the event starts inside the window but ends beyond the window -
>>> include this.
>>> g) the event starts and ends beyond the window - exclude this.
>>>
>>> In order to get every event in the range of c-f, here is what you need
>>> for a WHERE clause
>>>
>>> WHERE start <= (ending time) and end >= (starting time)
>>>
>>> Try that and let us know the results.
>>>
>>
>> Thanks Jochem and Shawn, however the following two queries result in the
>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>>
>> Old method:
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM firewall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE
>> (open_dt >= '2011-08-31 09:53:31' OR close_dt >= '2011-08-31 09:53:31')
>> AND (open_dt <= '2011-09-01 09:53:31' OR close_dt <= '2011-09-01
>> 09:53:31')
>> ORDER BY rcvd DESC
>> LIMIT 0, 10;
>>
>> New method with BTREE index on open_dt, close_dt (index name is
>> ndx_open_close_dt):
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM firewall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE
>> open_dt <= '2011-09-01 09:53:31' AND close_dt >= '2011-08-31 09:53:31'
>> ORDER BY rcvd DESC
>> LIMIT 0, 10;
>>
>> EXPLAIN output for old method:
>> +----+-------------+-------+--**------+--------------------- --**
>> ----+----------+---------+----**----------------+------+---- --**-------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>> rows | Extra |
>> +----+-------------+-------+--**------+--------------------- --**
>> ----+----------+---------+----**----------------+------+---- --**-------+
>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>> NULL | 10 | Using where |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>> 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>> 1 | |
>> +----+-------------+-------+--**------+--------------------- --**
>> ----+----------+---------+----**----------------+------+---- --**-------+
>>
>> EXPLAIN output for new method with new index:
>> +----+-------------+-------+--**------+--------------------- --**
>> ----+----------+---------+----**----------------+------+---- --**-------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>> rows | Extra |
>> +----+-------------+-------+--**------+--------------------- --**
>> ----+----------+---------+----**----------------+------+---- --**-------+
>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>> NULL | 10 | Using where |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>> 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>> 1 | |
>> +----+-------------+-------+--**------+--------------------- --**
>> ----+----------+---------+----**----------------+------+---- --**-------+
>>
>> SHOW INDEX:
>> +----------------------+------**------+-------------------+- --**
>> -----------+-------------+----**-------+-------------+------ --**
>> --+--------+------+-----------**-+---------+
>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>> +----------------------+------**------+-------------------+- --**
>> -----------+-------------+----**-------+-------------+------ --**
>> --+--------+------+-----------**-+---------+
>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691
>> | NULL | NULL | | BTREE | |
>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>> 6377783 | NULL | NULL | | BTREE | |
>> +----------------------+------**------+-------------------+- --**
>> -----------+-------------+----**-------+-------------+------ --**
>> --+--------+------+-----------**-+---------+
>>
>>
>> Although right now the queries do seem to be executing much faster,
>> although I'm not quite sure why. And I'm not sure why the new
>> ndx_open_close_dt isn't being used either.
>>
>> -Brandon
>>
>>
> I am still having a big issue with my query as seen above. The table is up
> to around 32 million records at the moment and either of the two SELECT
> queries above take a very long time to run. Is there anything at all I can
> do to speed things up? It seems that changing the format of the WHERE
> clause did not help at all, as the EXPLAIN output is exactly the same for
> both version. I also tried adding an index on (open_dt, close_dt, rcvd) but
> that index does not get used.
>
> Any other ideas?
>
> Thanks in advance,
>
>
> Brandon
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?**
> unsub=eroomydna@gmail.com
>
>

--20cf3056396d18961804ac720ec9--

Re: Query Optimization

am 08.09.2011 20:27:19 von Brandon Phelps

Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:
> Thinking outside the query, is there any archiving that could happen to make
> your large tables kinder in the range scan?
>
> Andy
>
> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>
>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>
>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>
>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>
>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>> ...
>>>>>
>>>>>>> WHERE
>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>> 00:00:00')
>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>> 12:36:53')
>>>>>
>>>>>> In that case your logic here simplifies to:
>>>>>> WHERE
>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>> AND
>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>
>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>
>>>>>> Jochem
>>>>>
>>>>> Jochem,
>>>>>
>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>> results where the open_dt time is out of the range specified. For
>>>>> example, a very large file download might span multiple days so given
>>>>> your logic if the connection was started 2 days ago and I want to pull 1
>>>>> days worth of connections, I would miss that entry. Basically I want to
>>>>> SELECT all of the records that were opened OR closed during the
>>>>> specified time period, ie. if any activity happened between my start and
>>>>> end dates, I need to see that record.
>>>>>
>>>>> Any other ideas?
>>>>>
>>>>>
>>>> I believe Jochem was on the right track but he got his dates reversed.
>>>>
>>>> Let's try a little ASCII art to show the situation. I will setup a query
>>>> window with two markers (s) and (e). Events will be marked by |----| markers
>>>> showing their durations.
>>>>
>>>> a) (s) (e)
>>>> b) |---|
>>>> c) |---|
>>>> d) |---|
>>>> e) |--------------------|
>>>> f) |---|
>>>> g) |---|
>>>>
>>>> To describe these situations:
>>>> a) is the window for which you want to query (s) is the starting time and
>>>> (e) is the ending time for the date range you are interested in.
>>>> b) the event starts and stops before your window exists. It won't be part
>>>> of your results.
>>>> c) the event starts before the window but ends within the window -
>>>> include this
>>>> d) the event starts and ends within the window - include this
>>>> e) the event starts before the window and ends after the window - include
>>>> this
>>>> f) the event starts inside the window but ends beyond the window -
>>>> include this.
>>>> g) the event starts and ends beyond the window - exclude this.
>>>>
>>>> In order to get every event in the range of c-f, here is what you need
>>>> for a WHERE clause
>>>>
>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>
>>>> Try that and let us know the results.
>>>>
>>>
>>> Thanks Jochem and Shawn, however the following two queries result in the
>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>>>
>>> Old method:
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM firewall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>> WHERE
>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>> 09:53:31')
>>> ORDER BY rcvd DESC
>>> LIMIT 0, 10;
>>>
>>> New method with BTREE index on open_dt, close_dt (index name is
>>> ndx_open_close_dt):
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM firewall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>> WHERE
>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>> ORDER BY rcvd DESC
>>> LIMIT 0, 10;
>>>
>>> EXPLAIN output for old method:
>>> +----+-------------+-------+--**------+--------------------- --**
>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>> rows | Extra |
>>> +----+-------------+-------+--**------+--------------------- --**
>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>> NULL | 10 | Using where |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>> 1 | |
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>> 1 | |
>>> +----+-------------+-------+--**------+--------------------- --**
>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>
>>> EXPLAIN output for new method with new index:
>>> +----+-------------+-------+--**------+--------------------- --**
>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>> rows | Extra |
>>> +----+-------------+-------+--**------+--------------------- --**
>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>> NULL | 10 | Using where |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>> 1 | |
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>> 1 | |
>>> +----+-------------+-------+--**------+--------------------- --**
>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>
>>> SHOW INDEX:
>>> +----------------------+------**------+-------------------+- --**
>>> -----------+-------------+----**-------+-------------+------ --**
>>> --+--------+------+-----------**-+---------+
>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>> +----------------------+------**------+-------------------+- --**
>>> -----------+-------------+----**-------+-------------+------ --**
>>> --+--------+------+-----------**-+---------+
>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691
>>> | NULL | NULL | | BTREE | |
>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>> 6377783 | NULL | NULL | | BTREE | |
>>> +----------------------+------**------+-------------------+- --**
>>> -----------+-------------+----**-------+-------------+------ --**
>>> --+--------+------+-----------**-+---------+
>>>
>>>
>>> Although right now the queries do seem to be executing much faster,
>>> although I'm not quite sure why. And I'm not sure why the new
>>> ndx_open_close_dt isn't being used either.
>>>
>>> -Brandon
>>>
>>>
>> I am still having a big issue with my query as seen above. The table is up
>> to around 32 million records at the moment and either of the two SELECT
>> queries above take a very long time to run. Is there anything at all I can
>> do to speed things up? It seems that changing the format of the WHERE
>> clause did not help at all, as the EXPLAIN output is exactly the same for
>> both version. I also tried adding an index on (open_dt, close_dt, rcvd) but
>> that index does not get used.
>>
>> Any other ideas?
>>
>> Thanks in advance,
>>
>>
>> Brandon
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?**
>> unsub=eroomydna@gmail.com
>>
>>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 20:38:07 von Andrew Moore

--20cf303f65eaa1405304ac725cb7
Content-Type: text/plain; charset=ISO-8859-1

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps wrote:

> Thanks for the reply Andy. Unfortunately the users will be selecting
> varying date ranges and new data is constantly coming in, so I am not sure
> how I could archive/cache the necessary data that would be any more
> efficient than simply using the database directly.
>
>
>
> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>
>> Thinking outside the query, is there any archiving that could happen to
>> make
>> your large tables kinder in the range scan?
>>
>> Andy
>>
>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>
>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>
>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>
>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>
>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>
>>>>>>> ...
>>>>>>>>
>>>>>>>
>>>>>> WHERE
>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>
>>>>>>> 00:00:00')
>>>>>>
>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>
>>>>>>> 12:36:53')
>>>>>>
>>>>>> In that case your logic here simplifies to:
>>>>>>> WHERE
>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>> AND
>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>
>>>>>>
>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>
>>>>>>
>>>>>> Jochem
>>>>>>>
>>>>>>
>>>>>> Jochem,
>>>>>>
>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>> results where the open_dt time is out of the range specified. For
>>>>>> example, a very large file download might span multiple days so given
>>>>>> your logic if the connection was started 2 days ago and I want to pull
>>>>>> 1
>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>> to
>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>> specified time period, ie. if any activity happened between my start
>>>>>> and
>>>>>> end dates, I need to see that record.
>>>>>>
>>>>>> Any other ideas?
>>>>>>
>>>>>>
>>>>>> I believe Jochem was on the right track but he got his dates
>>>>> reversed.
>>>>>
>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>> query
>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>> markers
>>>>> showing their durations.
>>>>>
>>>>> a) (s) (e)
>>>>> b) |---|
>>>>> c) |---|
>>>>> d) |---|
>>>>> e) |--------------------|
>>>>> f) |---|
>>>>> g) |---|
>>>>>
>>>>> To describe these situations:
>>>>> a) is the window for which you want to query (s) is the starting time
>>>>> and
>>>>> (e) is the ending time for the date range you are interested in.
>>>>> b) the event starts and stops before your window exists. It won't be
>>>>> part
>>>>> of your results.
>>>>> c) the event starts before the window but ends within the window -
>>>>> include this
>>>>> d) the event starts and ends within the window - include this
>>>>> e) the event starts before the window and ends after the window -
>>>>> include
>>>>> this
>>>>> f) the event starts inside the window but ends beyond the window -
>>>>> include this.
>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>
>>>>> In order to get every event in the range of c-f, here is what you need
>>>>> for a WHERE clause
>>>>>
>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>
>>>>> Try that and let us know the results.
>>>>>
>>>>>
>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>> you)
>>>>
>>>> Old method:
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>> 09:53:31')
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>>
>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>> ndx_open_close_dt):
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>>
>>>> EXPLAIN output for old method:
>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>> ----+----------+---------+----****----------------+------+-- --**
>>>> --**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>> |
>>>> rows | Extra |
>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>> ----+----------+---------+----****----------------+------+-- --**
>>>> --**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>> |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>> |
>>>> 1 | |
>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>> ----+----------+---------+----****----------------+------+-- --**
>>>> --**-------+
>>>>
>>>> EXPLAIN output for new method with new index:
>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>> ----+----------+---------+----****----------------+------+-- --**
>>>> --**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>> |
>>>> rows | Extra |
>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>> ----+----------+---------+----****----------------+------+-- --**
>>>> --**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>> |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>> |
>>>> 1 | |
>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>> ----+----------+---------+----****----------------+------+-- --**
>>>> --**-------+
>>>>
>>>> SHOW INDEX:
>>>> +----------------------+------****------+------------------- +-**--**
>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>> --+--------+------+-----------****-+---------+
>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
>>>> |
>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>> +----------------------+------****------+------------------- +-**--**
>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>> --+--------+------+-----------****-+---------+
>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>> 1342691
>>>> | NULL | NULL | | BTREE | |
>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>> 6377783 | NULL | NULL | | BTREE | |
>>>> +----------------------+------****------+------------------- +-**--**
>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>> --+--------+------+-----------****-+---------+
>>>>
>>>>
>>>> Although right now the queries do seem to be executing much faster,
>>>> although I'm not quite sure why. And I'm not sure why the new
>>>> ndx_open_close_dt isn't being used either.
>>>>
>>>> -Brandon
>>>>
>>>>
>>>> I am still having a big issue with my query as seen above. The table
>>> is up
>>> to around 32 million records at the moment and either of the two SELECT
>>> queries above take a very long time to run. Is there anything at all I
>>> can
>>> do to speed things up? It seems that changing the format of the WHERE
>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd)
>>> but
>>> that index does not get used.
>>>
>>> Any other ideas?
>>>
>>> Thanks in advance,
>>>
>>>
>>> Brandon
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>> unsub=eroomydna@gmail.com >>> unsub=eroomydna@gmail.com
>>> >
>>>
>>>
>>>
>>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?**
> unsub=eroomydna@gmail.com
>
>

--20cf303f65eaa1405304ac725cb7--

Re: Query Optimization

am 08.09.2011 20:47:03 von Mihail Manolov

If you're running version 5.1+ you may wanna take a look at table partition=
ing options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

> Thanks for the reply Andy. Unfortunately the users will be selecting var=
ying date ranges and new data is constantly coming in, so I am not sure how=
I could archive/cache the necessary data that would be any more efficient =
than simply using the database directly.
>=20
>=20
> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>> Thinking outside the query, is there any archiving that could happen to =
make
>> your large tables kinder in the range scan?
>>=20
>> Andy
>>=20
>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>=20
>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>=20
>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>=20
>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>=20
>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>> ...
>>>>>>=20
>>>>>>>> WHERE
>>>>>>>> (open_dt>=3D '2011-08-30 00:00:00' OR close_dt>=3D '2011-08-30
>>>>>> 00:00:00')
>>>>>>>> AND (open_dt<=3D '2011-08-30 12:36:53' OR close_dt<=3D '2011-08-30
>>>>>> 12:36:53')
>>>>>>=20
>>>>>>> In that case your logic here simplifies to:
>>>>>>> WHERE
>>>>>>> open_dt>=3D '2011-08-30 00:00:00'
>>>>>>> AND
>>>>>>> close_dt<=3D '2011-08-30 12:36:53'
>>>>>>=20
>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>=20
>>>>>>> Jochem
>>>>>>=20
>>>>>> Jochem,
>>>>>>=20
>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>> results where the open_dt time is out of the range specified. For
>>>>>> example, a very large file download might span multiple days so give=
n
>>>>>> your logic if the connection was started 2 days ago and I want to pu=
ll 1
>>>>>> days worth of connections, I would miss that entry. Basically I want=
to
>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>> specified time period, ie. if any activity happened between my start=
and
>>>>>> end dates, I need to see that record.
>>>>>>=20
>>>>>> Any other ideas?
>>>>>>=20
>>>>>>=20
>>>>> I believe Jochem was on the right track but he got his dates reversed=
..
>>>>>=20
>>>>> Let's try a little ASCII art to show the situation. I will setup a qu=
ery
>>>>> window with two markers (s) and (e). Events will be marked by |----| =
markers
>>>>> showing their durations.
>>>>>=20
>>>>> a) (s) (e)
>>>>> b) |---|
>>>>> c) |---|
>>>>> d) |---|
>>>>> e) |--------------------|
>>>>> f) |---|
>>>>> g) |---|
>>>>>=20
>>>>> To describe these situations:
>>>>> a) is the window for which you want to query (s) is the starting time=
and
>>>>> (e) is the ending time for the date range you are interested in.
>>>>> b) the event starts and stops before your window exists. It won't be =
part
>>>>> of your results.
>>>>> c) the event starts before the window but ends within the window -
>>>>> include this
>>>>> d) the event starts and ends within the window - include this
>>>>> e) the event starts before the window and ends after the window - inc=
lude
>>>>> this
>>>>> f) the event starts inside the window but ends beyond the window -
>>>>> include this.
>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>=20
>>>>> In order to get every event in the range of c-f, here is what you nee=
d
>>>>> for a WHERE clause
>>>>>=20
>>>>> WHERE start<=3D (ending time) and end>=3D (starting time)
>>>>>=20
>>>>> Try that and let us know the results.
>>>>>=20
>>>>=20
>>>> Thanks Jochem and Shawn, however the following two queries result in t=
he
>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for=
you)
>>>>=20
>>>> Old method:
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>> WHERE
>>>> (open_dt>=3D '2011-08-31 09:53:31' OR close_dt>=3D '2011-08-31 09:53:3=
1')
>>>> AND (open_dt<=3D '2011-09-01 09:53:31' OR close_dt<=3D '2011-09-01
>>>> 09:53:31')
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>>=20
>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>> ndx_open_close_dt):
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>> WHERE
>>>> open_dt<=3D '2011-09-01 09:53:31' AND close_dt>=3D '2011-08-31 09:53:3=
1'
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>>=20
>>>> EXPLAIN output for old method:
>>>> +----+-------------+-------+--**------+--------------------- --**
>>>> ----+----------+---------+----**----------------+------+---- --**------=
-+
>>>> | id | select_type | table | type | possible_keys | key | key_len | re=
f |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+--------------------- --**
>>>> ----+----------+---------+----**----------------+------+---- --**------=
-+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_po=
rt |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_po=
rt |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+--------------------- --**
>>>> ----+----------+---------+----**----------------+------+---- --**------=
-+
>>>>=20
>>>> EXPLAIN output for new method with new index:
>>>> +----+-------------+-------+--**------+--------------------- --**
>>>> ----+----------+---------+----**----------------+------+---- --**------=
-+
>>>> | id | select_type | table | type | possible_keys | key | key_len | re=
f |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+--------------------- --**
>>>> ----+----------+---------+----**----------------+------+---- --**------=
-+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_po=
rt |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_po=
rt |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+--------------------- --**
>>>> ----+----------+---------+----**----------------+------+---- --**------=
-+
>>>>=20
>>>> SHOW INDEX:
>>>> +----------------------+------**------+-------------------+- --**
>>>> -----------+-------------+----**-------+-------------+------ --**
>>>> --+--------+------+-----------**-+---------+
>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collati=
on |
>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>> +----------------------+------**------+-------------------+- --**
>>>> -----------+-------------+----**-------+-------------+------ --**
>>>> --+--------+------+-----------**-+---------+
>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 134=
2691
>>>> | NULL | NULL | | BTREE | |
>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>> 6377783 | NULL | NULL | | BTREE | |
>>>> +----------------------+------**------+-------------------+- --**
>>>> -----------+-------------+----**-------+-------------+------ --**
>>>> --+--------+------+-----------**-+---------+
>>>>=20
>>>>=20
>>>> Although right now the queries do seem to be executing much faster,
>>>> although I'm not quite sure why. And I'm not sure why the new
>>>> ndx_open_close_dt isn't being used either.
>>>>=20
>>>> -Brandon
>>>>=20
>>>>=20
>>> I am still having a big issue with my query as seen above. The table i=
s up
>>> to around 32 million records at the moment and either of the two SELECT
>>> queries above take a very long time to run. Is there anything at all I=
can
>>> do to speed things up? It seems that changing the format of the WHERE
>>> clause did not help at all, as the EXPLAIN output is exactly the same f=
or
>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd=
) but
>>> that index does not get used.
>>>=20
>>> Any other ideas?
>>>=20
>>> Thanks in advance,
>>>=20
>>>=20
>>> Brandon
>>>=20
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>> unsub=3Deroomydna@gmail.com dna@gmail.com>
>>>=20
>>>=20
>>=20
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmmanolov@liquidat=
ion.com
>=20


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Query Optimization

am 08.09.2011 20:50:24 von Brandon Phelps

Andy,

The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is:

+----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
| 1 | SIMPLE | sc | range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
+----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+

When I remove the LIMIT 10 I get:

----+-------------+-------+--------+---------------------+-- -------+---------+--------------------+----------+---------- -------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
| 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | NULL | NULL | 32393330 | Using where; Using filesort |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
+----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:
> I don't think I saw any query timings in the emails (maybe I missed them).
>
> What version of MySQL are you currently using?
> What does the explain look like when your remove the limit 10?
> Is your server tuned for MyISAM or InnoDB?
> What kind of disk setup is in use?
> How much memory is in your machine?
>
>
> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps wrote:
>
>> Thanks for the reply Andy. Unfortunately the users will be selecting
>> varying date ranges and new data is constantly coming in, so I am not sure
>> how I could archive/cache the necessary data that would be any more
>> efficient than simply using the database directly.
>>
>>
>>
>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>
>>> Thinking outside the query, is there any archiving that could happen to
>>> make
>>> your large tables kinder in the range scan?
>>>
>>> Andy
>>>
>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>>
>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>
>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>
>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>
>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>
>>>>>>>> ...
>>>>>>>>>
>>>>>>>>
>>>>>>> WHERE
>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>
>>>>>>>> 00:00:00')
>>>>>>>
>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>
>>>>>>>> 12:36:53')
>>>>>>>
>>>>>>> In that case your logic here simplifies to:
>>>>>>>> WHERE
>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>> AND
>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>
>>>>>>>
>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>
>>>>>>>
>>>>>>> Jochem
>>>>>>>>
>>>>>>>
>>>>>>> Jochem,
>>>>>>>
>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>> example, a very large file download might span multiple days so given
>>>>>>> your logic if the connection was started 2 days ago and I want to pull
>>>>>>> 1
>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>> to
>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>> and
>>>>>>> end dates, I need to see that record.
>>>>>>>
>>>>>>> Any other ideas?
>>>>>>>
>>>>>>>
>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>> reversed.
>>>>>>
>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>> query
>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>> markers
>>>>>> showing their durations.
>>>>>>
>>>>>> a) (s) (e)
>>>>>> b) |---|
>>>>>> c) |---|
>>>>>> d) |---|
>>>>>> e) |--------------------|
>>>>>> f) |---|
>>>>>> g) |---|
>>>>>>
>>>>>> To describe these situations:
>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>> and
>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>> part
>>>>>> of your results.
>>>>>> c) the event starts before the window but ends within the window -
>>>>>> include this
>>>>>> d) the event starts and ends within the window - include this
>>>>>> e) the event starts before the window and ends after the window -
>>>>>> include
>>>>>> this
>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>> include this.
>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>
>>>>>> In order to get every event in the range of c-f, here is what you need
>>>>>> for a WHERE clause
>>>>>>
>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>
>>>>>> Try that and let us know the results.
>>>>>>
>>>>>>
>>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>> you)
>>>>>
>>>>> Old method:
>>>>> SELECT
>>>>> sc.open_dt,
>>>>> sc.close_dt,
>>>>> sc.protocol,
>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>> sc.src_port,
>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>> sc.dst_port,
>>>>> sc.sent,
>>>>> sc.rcvd,
>>>>> spm.desc AS src_port_desc,
>>>>> dpm.desc AS dst_port_desc
>>>>> FROM firewall_connections AS sc
>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>> WHERE
>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>> 09:53:31')
>>>>> ORDER BY rcvd DESC
>>>>> LIMIT 0, 10;
>>>>>
>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>> ndx_open_close_dt):
>>>>> SELECT
>>>>> sc.open_dt,
>>>>> sc.close_dt,
>>>>> sc.protocol,
>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>> sc.src_port,
>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>> sc.dst_port,
>>>>> sc.sent,
>>>>> sc.rcvd,
>>>>> spm.desc AS src_port_desc,
>>>>> dpm.desc AS dst_port_desc
>>>>> FROM firewall_connections AS sc
>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>> WHERE
>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>> ORDER BY rcvd DESC
>>>>> LIMIT 0, 10;
>>>>>
>>>>> EXPLAIN output for old method:
>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>> --**-------+
>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>> |
>>>>> rows | Extra |
>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>> --**-------+
>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>> NULL | 10 | Using where |
>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>> |
>>>>> 1 | |
>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>> |
>>>>> 1 | |
>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>> --**-------+
>>>>>
>>>>> EXPLAIN output for new method with new index:
>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>> --**-------+
>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>> |
>>>>> rows | Extra |
>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>> --**-------+
>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>> NULL | 10 | Using where |
>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>> |
>>>>> 1 | |
>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>> |
>>>>> 1 | |
>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>> --**-------+
>>>>>
>>>>> SHOW INDEX:
>>>>> +----------------------+------****------+------------------- +-**--**
>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>> --+--------+------+-----------****-+---------+
>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
>>>>> |
>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>> +----------------------+------****------+------------------- +-**--**
>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>> --+--------+------+-----------****-+---------+
>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>> 1342691
>>>>> | NULL | NULL | | BTREE | |
>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>> +----------------------+------****------+------------------- +-**--**
>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>> --+--------+------+-----------****-+---------+
>>>>>
>>>>>
>>>>> Although right now the queries do seem to be executing much faster,
>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>> ndx_open_close_dt isn't being used either.
>>>>>
>>>>> -Brandon
>>>>>
>>>>>
>>>>> I am still having a big issue with my query as seen above. The table
>>>> is up
>>>> to around 32 million records at the moment and either of the two SELECT
>>>> queries above take a very long time to run. Is there anything at all I
>>>> can
>>>> do to speed things up? It seems that changing the format of the WHERE
>>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd)
>>>> but
>>>> that index does not get used.
>>>>
>>>> Any other ideas?
>>>>
>>>> Thanks in advance,
>>>>
>>>>
>>>> Brandon
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>>> unsub=eroomydna@gmail.com >>>> unsub=eroomydna@gmail.com
>>>>>
>>>>
>>>>
>>>>
>>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?**
>> unsub=eroomydna@gmail.com
>>
>>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 20:55:18 von Derek Downey

Correct me if I'm wrong. You're wanting to get all records that have an =
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a =
query like:

WHERE ((starting time)<=3Dopen_dt<=3D (ending time)) OR ((starting =
time)<=3Dclose_dt<=3D(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimizat ion.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

> Andy,
>=20
> The queries take minutes to run. MySQL is 5.1.54 and it's running on =
Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but =
no other major daemons are running on the machine. We are running RAID =
1 (mirroring) with 1TB drives. The tables in question here are all =
MyISAM. When running with the LIMIT 10 my EXPLAIN is:
>=20
> =
+----+-------------+-------+--------+---------------------+- --------------=
------+---------+--------------------+----------+----------- --------------=
----+
> | id | select_type | table | type | possible_keys | key =
| key_len | ref | rows | Extra =
|
> =
+----+-------------+-------+--------+---------------------+- --------------=
------+---------+--------------------+----------+----------- --------------=
----+
> | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | =
ndx_open_close_rcvd | 8 | NULL | 32393316 | Using =
where; Using filesort |
> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY =
| 2 | syslog.sc.src_port | 1 | =
|
> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY =
| 2 | syslog.sc.dst_port | 1 | =
|
> =
+----+-------------+-------+--------+---------------------+- --------------=
------+---------+--------------------+----------+----------- --------------=
----+
>=20
> When I remove the LIMIT 10 I get:
>=20
> =
----+-------------+-------+--------+---------------------+-- -------+------=
---+--------------------+----------+------------------------ -----+
> | id | select_type | table | type | possible_keys | key | =
key_len | ref | rows | Extra |
> =
+----+-------------+-------+--------+---------------------+- --------+-----=
----+--------------------+----------+----------------------- ------+
> | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | =
NULL | NULL | 32393330 | Using where; Using filesort |
> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | =
2 | syslog.sc.src_port | 1 | |
> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | =
2 | syslog.sc.dst_port | 1 | |
> =
+----+-------------+-------+--------+---------------------+- --------+-----=
----+--------------------+----------+----------------------- ------+
>=20
> Thanks for all your help thus far.
>=20
> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>> I don't think I saw any query timings in the emails (maybe I missed =
them).
>>=20
>> What version of MySQL are you currently using?
>> What does the explain look like when your remove the limit 10?
>> Is your server tuned for MyISAM or InnoDB?
>> What kind of disk setup is in use?
>> How much memory is in your machine?
>>=20
>>=20
>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps =
wrote:
>>=20
>>> Thanks for the reply Andy. Unfortunately the users will be =
selecting
>>> varying date ranges and new data is constantly coming in, so I am =
not sure
>>> how I could archive/cache the necessary data that would be any more
>>> efficient than simply using the database directly.
>>>=20
>>>=20
>>>=20
>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>=20
>>>> Thinking outside the query, is there any archiving that could =
happen to
>>>> make
>>>> your large tables kinder in the range scan?
>>>>=20
>>>> Andy
>>>>=20
>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps =
wrote:
>>>>=20
>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>=20
>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>=20
>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>=20
>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>=20
>>>>>>>>> ...
>>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>> WHERE
>>>>>>>>>> (open_dt>=3D '2011-08-30 00:00:00' OR close_dt>=3D =
'2011-08-30
>>>>>>>>>>=20
>>>>>>>>> 00:00:00')
>>>>>>>>=20
>>>>>>>>> AND (open_dt<=3D '2011-08-30 12:36:53' OR close_dt<=3D =
'2011-08-30
>>>>>>>>>>=20
>>>>>>>>> 12:36:53')
>>>>>>>>=20
>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>> WHERE
>>>>>>>>> open_dt>=3D '2011-08-30 00:00:00'
>>>>>>>>> AND
>>>>>>>>> close_dt<=3D '2011-08-30 12:36:53'
>>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Now add an index over open_dt and close_dt and see what =
happens.
>>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Jochem
>>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Jochem,
>>>>>>>>=20
>>>>>>>> I can't really use your WHERE logic because I also need to =
retrieve
>>>>>>>> results where the open_dt time is out of the range specified. =
For
>>>>>>>> example, a very large file download might span multiple days so =
given
>>>>>>>> your logic if the connection was started 2 days ago and I want =
to pull
>>>>>>>> 1
>>>>>>>> days worth of connections, I would miss that entry. Basically I =
want
>>>>>>>> to
>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>> specified time period, ie. if any activity happened between my =
start
>>>>>>>> and
>>>>>>>> end dates, I need to see that record.
>>>>>>>>=20
>>>>>>>> Any other ideas?
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>> reversed.
>>>>>>>=20
>>>>>>> Let's try a little ASCII art to show the situation. I will setup =
a
>>>>>>> query
>>>>>>> window with two markers (s) and (e). Events will be marked by =
|----|
>>>>>>> markers
>>>>>>> showing their durations.
>>>>>>>=20
>>>>>>> a) (s) (e)
>>>>>>> b) |---|
>>>>>>> c) |---|
>>>>>>> d) |---|
>>>>>>> e) |--------------------|
>>>>>>> f) |---|
>>>>>>> g) |---|
>>>>>>>=20
>>>>>>> To describe these situations:
>>>>>>> a) is the window for which you want to query (s) is the starting =
time
>>>>>>> and
>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>> b) the event starts and stops before your window exists. It =
won't be
>>>>>>> part
>>>>>>> of your results.
>>>>>>> c) the event starts before the window but ends within the window =
-
>>>>>>> include this
>>>>>>> d) the event starts and ends within the window - include this
>>>>>>> e) the event starts before the window and ends after the window =
-
>>>>>>> include
>>>>>>> this
>>>>>>> f) the event starts inside the window but ends beyond the window =
-
>>>>>>> include this.
>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>=20
>>>>>>> In order to get every event in the range of c-f, here is what =
you need
>>>>>>> for a WHERE clause
>>>>>>>=20
>>>>>>> WHERE start<=3D (ending time) and end>=3D (starting time)
>>>>>>>=20
>>>>>>> Try that and let us know the results.
>>>>>>>=20
>>>>>>>=20
>>>>>> Thanks Jochem and Shawn, however the following two queries result =
in the
>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too =
early for
>>>>>> you)
>>>>>>=20
>>>>>> Old method:
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>>>> WHERE
>>>>>> (open_dt>=3D '2011-08-31 09:53:31' OR close_dt>=3D '2011-08-31 =
09:53:31')
>>>>>> AND (open_dt<=3D '2011-09-01 09:53:31' OR close_dt<=3D =
'2011-09-01
>>>>>> 09:53:31')
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>>=20
>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>> ndx_open_close_dt):
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>>>> WHERE
>>>>>> open_dt<=3D '2011-09-01 09:53:31' AND close_dt>=3D '2011-08-31 =
09:53:31'
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>>=20
>>>>>> EXPLAIN output for old method:
>>>>>> =
+----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len =
| ref
>>>>>> |
>>>>>> rows | Extra |
>>>>>> =
+----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd =
| 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | =
syslog.sc.src_port
>>>>>> |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | =
syslog.sc.dst_port
>>>>>> |
>>>>>> 1 | |
>>>>>> =
+----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>>=20
>>>>>> EXPLAIN output for new method with new index:
>>>>>> =
+----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len =
| ref
>>>>>> |
>>>>>> rows | Extra |
>>>>>> =
+----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd =
| 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | =
syslog.sc.src_port
>>>>>> |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | =
syslog.sc.dst_port
>>>>>> |
>>>>>> 1 | |
>>>>>> =
+----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>>=20
>>>>>> SHOW INDEX:
>>>>>> =
+----------------------+------****------+------------------- +-**--**
>>>>>> =
-----------+-------------+----****-------+-------------+---- --**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | =
Collation
>>>>>> |
>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>> =
+----------------------+------****------+------------------- +-**--**
>>>>>> =
-----------+-------------+----****-------+-------------+---- --**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A =
|
>>>>>> 1342691
>>>>>> | NULL | NULL | | BTREE | |
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A =
|
>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>> =
+----------------------+------****------+------------------- +-**--**
>>>>>> =
-----------+-------------+----****-------+-------------+---- --**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>>=20
>>>>>>=20
>>>>>> Although right now the queries do seem to be executing much =
faster,
>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>=20
>>>>>> -Brandon
>>>>>>=20
>>>>>>=20
>>>>>> I am still having a big issue with my query as seen above. The =
table
>>>>> is up
>>>>> to around 32 million records at the moment and either of the two =
SELECT
>>>>> queries above take a very long time to run. Is there anything at =
all I
>>>>> can
>>>>> do to speed things up? It seems that changing the format of the =
WHERE
>>>>> clause did not help at all, as the EXPLAIN output is exactly the =
same for
>>>>> both version. I also tried adding an index on (open_dt, close_dt, =
rcvd)
>>>>> but
>>>>> that index does not get used.
>>>>>=20
>>>>> Any other ideas?
>>>>>=20
>>>>> Thanks in advance,
>>>>>=20
>>>>>=20
>>>>> Brandon
>>>>>=20
>>>>> --
>>>>> MySQL General Mailing List
>>>>> For list archives: http://lists.mysql.com/mysql
>>>>> To unsubscribe: =
http://lists.mysql.com/mysql?****
>>>>> unsub=3Deroomydna@gmail.com >>>>> =
unsub=3Deroomydna@gmail.com @gmail.com>
>>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>=20
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>> =
unsub=3Deroomydna@gmail.com @gmail.com>
>>>=20
>>>=20
>>=20
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: =
http://lists.mysql.com/mysql?unsub=3Dderek@orange-pants.com
>=20


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Query Optimization

am 08.09.2011 21:04:45 von Brandon Phelps

Mihail,

I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee).

I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this sch
eme will need to be accessed anyway.

The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful.

I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know.

Thanks,
Brandon


On 09/08/2011 02:47 PM, Mihail Manolov wrote:
> If you're running version 5.1+ you may wanna take a look at table partitioning options you may have.
>
> On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:
>
>> Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly.
>>
>>
>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>> Thinking outside the query, is there any archiving that could happen to make
>>> your large tables kinder in the range scan?
>>>
>>> Andy
>>>
>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>>
>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>
>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>
>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>
>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>> ...
>>>>>>>
>>>>>>>>> WHERE
>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>> 00:00:00')
>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>> 12:36:53')
>>>>>>>
>>>>>>>> In that case your logic here simplifies to:
>>>>>>>> WHERE
>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>> AND
>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>
>>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>
>>>>>>>> Jochem
>>>>>>>
>>>>>>> Jochem,
>>>>>>>
>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>> example, a very large file download might span multiple days so given
>>>>>>> your logic if the connection was started 2 days ago and I want to pull 1
>>>>>>> days worth of connections, I would miss that entry. Basically I want to
>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>> specified time period, ie. if any activity happened between my start and
>>>>>>> end dates, I need to see that record.
>>>>>>>
>>>>>>> Any other ideas?
>>>>>>>
>>>>>>>
>>>>>> I believe Jochem was on the right track but he got his dates reversed.
>>>>>>
>>>>>> Let's try a little ASCII art to show the situation. I will setup a query
>>>>>> window with two markers (s) and (e). Events will be marked by |----| markers
>>>>>> showing their durations.
>>>>>>
>>>>>> a) (s) (e)
>>>>>> b) |---|
>>>>>> c) |---|
>>>>>> d) |---|
>>>>>> e) |--------------------|
>>>>>> f) |---|
>>>>>> g) |---|
>>>>>>
>>>>>> To describe these situations:
>>>>>> a) is the window for which you want to query (s) is the starting time and
>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>> b) the event starts and stops before your window exists. It won't be part
>>>>>> of your results.
>>>>>> c) the event starts before the window but ends within the window -
>>>>>> include this
>>>>>> d) the event starts and ends within the window - include this
>>>>>> e) the event starts before the window and ends after the window - include
>>>>>> this
>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>> include this.
>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>
>>>>>> In order to get every event in the range of c-f, here is what you need
>>>>>> for a WHERE clause
>>>>>>
>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>
>>>>>> Try that and let us know the results.
>>>>>>
>>>>>
>>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>>>>>
>>>>> Old method:
>>>>> SELECT
>>>>> sc.open_dt,
>>>>> sc.close_dt,
>>>>> sc.protocol,
>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>> sc.src_port,
>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>> sc.dst_port,
>>>>> sc.sent,
>>>>> sc.rcvd,
>>>>> spm.desc AS src_port_desc,
>>>>> dpm.desc AS dst_port_desc
>>>>> FROM firewall_connections AS sc
>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>> WHERE
>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>> 09:53:31')
>>>>> ORDER BY rcvd DESC
>>>>> LIMIT 0, 10;
>>>>>
>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>> ndx_open_close_dt):
>>>>> SELECT
>>>>> sc.open_dt,
>>>>> sc.close_dt,
>>>>> sc.protocol,
>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>> sc.src_port,
>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>> sc.dst_port,
>>>>> sc.sent,
>>>>> sc.rcvd,
>>>>> spm.desc AS src_port_desc,
>>>>> dpm.desc AS dst_port_desc
>>>>> FROM firewall_connections AS sc
>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>> WHERE
>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>> ORDER BY rcvd DESC
>>>>> LIMIT 0, 10;
>>>>>
>>>>> EXPLAIN output for old method:
>>>>> +----+-------------+-------+--**------+--------------------- --**
>>>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>>>> rows | Extra |
>>>>> +----+-------------+-------+--**------+--------------------- --**
>>>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>> NULL | 10 | Using where |
>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>>>> 1 | |
>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>>>> 1 | |
>>>>> +----+-------------+-------+--**------+--------------------- --**
>>>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>>>
>>>>> EXPLAIN output for new method with new index:
>>>>> +----+-------------+-------+--**------+--------------------- --**
>>>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>>>> rows | Extra |
>>>>> +----+-------------+-------+--**------+--------------------- --**
>>>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>> NULL | 10 | Using where |
>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>>>> 1 | |
>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>>>> 1 | |
>>>>> +----+-------------+-------+--**------+--------------------- --**
>>>>> ----+----------+---------+----**----------------+------+---- --**-------+
>>>>>
>>>>> SHOW INDEX:
>>>>> +----------------------+------**------+-------------------+- --**
>>>>> -----------+-------------+----**-------+-------------+------ --**
>>>>> --+--------+------+-----------**-+---------+
>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>> +----------------------+------**------+-------------------+- --**
>>>>> -----------+-------------+----**-------+-------------+------ --**
>>>>> --+--------+------+-----------**-+---------+
>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691
>>>>> | NULL | NULL | | BTREE | |
>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>> +----------------------+------**------+-------------------+- --**
>>>>> -----------+-------------+----**-------+-------------+------ --**
>>>>> --+--------+------+-----------**-+---------+
>>>>>
>>>>>
>>>>> Although right now the queries do seem to be executing much faster,
>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>> ndx_open_close_dt isn't being used either.
>>>>>
>>>>> -Brandon
>>>>>
>>>>>
>>>> I am still having a big issue with my query as seen above. The table is up
>>>> to around 32 million records at the moment and either of the two SELECT
>>>> queries above take a very long time to run. Is there anything at all I can
>>>> do to speed things up? It seems that changing the format of the WHERE
>>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd) but
>>>> that index does not get used.
>>>>
>>>> Any other ideas?
>>>>
>>>> Thanks in advance,
>>>>
>>>>
>>>> Brandon
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>> unsub=eroomydna@gmail.com
>>>>
>>>>
>>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=mmanolov@liquidation.com
>>
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 21:25:19 von Brandon Phelps

Thanks for the idea Derek, however given the following query my EXPLAIN output is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' <= open_dt <= '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' <= close_dt <= '2011-09-08 13:18:58');


+----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
| 1 | SIMPLE | sc | ALL | NULL | NULL | NULL | NULL | 32393330 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
+----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:
> Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times.
>
> If that's correct, you might be able to get an index_merge by doing a query like:
>
> WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting time)<=close_dt<=(ending time))
>
> and creating two indexes (one on 'open_dt' and the other on 'close_dt')
>
> http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimizat ion.html
>
> Regards,
> Derek
>
> On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
>
>> Andy,
>>
>> The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is:
>>
>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>> | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>
>> When I remove the LIMIT 10 I get:
>>
>> ----+-------------+-------+--------+---------------------+-- -------+---------+--------------------+----------+---------- -------------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
>> | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | NULL | NULL | 32393330 | Using where; Using filesort |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>> +----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
>>
>> Thanks for all your help thus far.
>>
>> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>>> I don't think I saw any query timings in the emails (maybe I missed them).
>>>
>>> What version of MySQL are you currently using?
>>> What does the explain look like when your remove the limit 10?
>>> Is your server tuned for MyISAM or InnoDB?
>>> What kind of disk setup is in use?
>>> How much memory is in your machine?
>>>
>>>
>>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps wrote:
>>>
>>>> Thanks for the reply Andy. Unfortunately the users will be selecting
>>>> varying date ranges and new data is constantly coming in, so I am not sure
>>>> how I could archive/cache the necessary data that would be any more
>>>> efficient than simply using the database directly.
>>>>
>>>>
>>>>
>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>
>>>>> Thinking outside the query, is there any archiving that could happen to
>>>>> make
>>>>> your large tables kinder in the range scan?
>>>>>
>>>>> Andy
>>>>>
>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>>>>
>>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>
>>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>
>>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>
>>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> WHERE
>>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>>>
>>>>>>>>>> 00:00:00')
>>>>>>>>>
>>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>>>
>>>>>>>>>> 12:36:53')
>>>>>>>>>
>>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>>> WHERE
>>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>>> AND
>>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jochem
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jochem,
>>>>>>>>>
>>>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>>>> example, a very large file download might span multiple days so given
>>>>>>>>> your logic if the connection was started 2 days ago and I want to pull
>>>>>>>>> 1
>>>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>>>> to
>>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>>>> and
>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>
>>>>>>>>> Any other ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>>> reversed.
>>>>>>>>
>>>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>>>> query
>>>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>>>> markers
>>>>>>>> showing their durations.
>>>>>>>>
>>>>>>>> a) (s) (e)
>>>>>>>> b) |---|
>>>>>>>> c) |---|
>>>>>>>> d) |---|
>>>>>>>> e) |--------------------|
>>>>>>>> f) |---|
>>>>>>>> g) |---|
>>>>>>>>
>>>>>>>> To describe these situations:
>>>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>>>> and
>>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>>>> part
>>>>>>>> of your results.
>>>>>>>> c) the event starts before the window but ends within the window -
>>>>>>>> include this
>>>>>>>> d) the event starts and ends within the window - include this
>>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>>> include
>>>>>>>> this
>>>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>>>> include this.
>>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>>
>>>>>>>> In order to get every event in the range of c-f, here is what you need
>>>>>>>> for a WHERE clause
>>>>>>>>
>>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>>>
>>>>>>>> Try that and let us know the results.
>>>>>>>>
>>>>>>>>
>>>>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>>>> you)
>>>>>>>
>>>>>>> Old method:
>>>>>>> SELECT
>>>>>>> sc.open_dt,
>>>>>>> sc.close_dt,
>>>>>>> sc.protocol,
>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>> sc.src_port,
>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>> sc.dst_port,
>>>>>>> sc.sent,
>>>>>>> sc.rcvd,
>>>>>>> spm.desc AS src_port_desc,
>>>>>>> dpm.desc AS dst_port_desc
>>>>>>> FROM firewall_connections AS sc
>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>> WHERE
>>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>>>> 09:53:31')
>>>>>>> ORDER BY rcvd DESC
>>>>>>> LIMIT 0, 10;
>>>>>>>
>>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>>> ndx_open_close_dt):
>>>>>>> SELECT
>>>>>>> sc.open_dt,
>>>>>>> sc.close_dt,
>>>>>>> sc.protocol,
>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>> sc.src_port,
>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>> sc.dst_port,
>>>>>>> sc.sent,
>>>>>>> sc.rcvd,
>>>>>>> spm.desc AS src_port_desc,
>>>>>>> dpm.desc AS dst_port_desc
>>>>>>> FROM firewall_connections AS sc
>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>> WHERE
>>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>>>> ORDER BY rcvd DESC
>>>>>>> LIMIT 0, 10;
>>>>>>>
>>>>>>> EXPLAIN output for old method:
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>>> |
>>>>>>> rows | Extra |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>> NULL | 10 | Using where |
>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>>
>>>>>>> EXPLAIN output for new method with new index:
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>>> |
>>>>>>> rows | Extra |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>> NULL | 10 | Using where |
>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>>
>>>>>>> SHOW INDEX:
>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
>>>>>>> |
>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>>> 1342691
>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>
>>>>>>>
>>>>>>> Although right now the queries do seem to be executing much faster,
>>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>
>>>>>>> -Brandon
>>>>>>>
>>>>>>>
>>>>>>> I am still having a big issue with my query as seen above. The table
>>>>>> is up
>>>>>> to around 32 million records at the moment and either of the two SELECT
>>>>>> queries above take a very long time to run. Is there anything at all I
>>>>>> can
>>>>>> do to speed things up? It seems that changing the format of the WHERE
>>>>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>>>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd)
>>>>>> but
>>>>>> that index does not get used.
>>>>>>
>>>>>> Any other ideas?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>>
>>>>>> Brandon
>>>>>>
>>>>>> --
>>>>>> MySQL General Mailing List
>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>>>>> unsub=eroomydna@gmail.com >>>>>> unsub=eroomydna@gmail.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>> unsub=eroomydna@gmail.com
>>>>
>>>>
>>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=derek@orange-pants.com
>>
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 21:26:38 von Andrew Moore

--20cf303b4427238db704ac730a0b
Content-Type: text/plain; charset=ISO-8859-1

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps wrote:

> Mihail,
>
> I have considered this but have not yet determined how best to go about
> partitioning the table. I don't think partitioning by dst_address or
> src_address would help because most of the queries do not filter on IP
> address (except very specific queries where the end-user is searching the
> table for history on a particular employee).
>
> I could potentially partition the table based on the day of week the
> connection was opened on which may improve performance for a while since
> this would take me from a single 32million record table down to roughly 4.5
> million records per partition (32 / 7) however we are looking to eventually
> store up to 2 months worth of data in the table, and so far that 32 million
> records is only for 1 month, so I estimate another 32 million-ish before the
> month is out, bringing me to roughly 70 million records total (it would be
> nice if I could store even more than 2 months, but given my currently
> performance dilemma I don't expect that to happen). Also this does not take
> into account that the end-user will often be pulling data for multiple days
> at a time, meaning that multiple partitions in this scheme will need to be
> accessed anyway.
>
> The only other logical partitioning scheme I can think of would be to
> partition based on dst_port (the port the log relates to) but the majority
> of records are all to port 80 (standard web traffic) so I don't think this
> would be all that helpful.
>
> I have never messed with partitioning so it is totally possible that I am
> not thinking of something, so if you have any ideas on a decent partitioning
> scheme based on my criteria and queries below, please let me know.
>
> Thanks,
> Brandon
>
>
> On 09/08/2011 02:47 PM, Mihail Manolov wrote:
>
>> If you're running version 5.1+ you may wanna take a look at table
>> partitioning options you may have.
>>
>> On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:
>>
>> Thanks for the reply Andy. Unfortunately the users will be selecting
>>> varying date ranges and new data is constantly coming in, so I am not sure
>>> how I could archive/cache the necessary data that would be any more
>>> efficient than simply using the database directly.
>>>
>>>
>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>
>>>> Thinking outside the query, is there any archiving that could happen to
>>>> make
>>>> your large tables kinder in the range scan?
>>>>
>>>> Andy
>>>>
>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps
>>>> wrote:
>>>>
>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>
>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>
>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>
>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> WHERE
>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>>
>>>>>>>>> 00:00:00')
>>>>>>>>
>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>>
>>>>>>>>> 12:36:53')
>>>>>>>>
>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>> WHERE
>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>> AND
>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>>
>>>>>>>>
>>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Jochem
>>>>>>>>>
>>>>>>>>
>>>>>>>> Jochem,
>>>>>>>>
>>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>>> example, a very large file download might span multiple days so
>>>>>>>> given
>>>>>>>> your logic if the connection was started 2 days ago and I want to
>>>>>>>> pull 1
>>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>>> to
>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>>> and
>>>>>>>> end dates, I need to see that record.
>>>>>>>>
>>>>>>>> Any other ideas?
>>>>>>>>
>>>>>>>>
>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>> reversed.
>>>>>>>
>>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>>> query
>>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>>> markers
>>>>>>> showing their durations.
>>>>>>>
>>>>>>> a) (s) (e)
>>>>>>> b) |---|
>>>>>>> c) |---|
>>>>>>> d) |---|
>>>>>>> e) |--------------------|
>>>>>>> f) |---|
>>>>>>> g) |---|
>>>>>>>
>>>>>>> To describe these situations:
>>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>>> and
>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>>> part
>>>>>>> of your results.
>>>>>>> c) the event starts before the window but ends within the window -
>>>>>>> include this
>>>>>>> d) the event starts and ends within the window - include this
>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>> include
>>>>>>> this
>>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>>> include this.
>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>
>>>>>>> In order to get every event in the range of c-f, here is what you
>>>>>>> need
>>>>>>> for a WHERE clause
>>>>>>>
>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>>
>>>>>>> Try that and let us know the results.
>>>>>>>
>>>>>>>
>>>>>> Thanks Jochem and Shawn, however the following two queries result in
>>>>>> the
>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>>> you)
>>>>>>
>>>>>> Old method:
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>> WHERE
>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>>> 09:53:31')
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>>
>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>> ndx_open_close_dt):
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>> WHERE
>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>>
>>>>>> EXPLAIN output for old method:
>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len |
>>>>>> ref |
>>>>>> rows | Extra |
>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>> syslog.sc.src_port |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>> syslog.sc.dst_port |
>>>>>> 1 | |
>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>>
>>>>>> EXPLAIN output for new method with new index:
>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len |
>>>>>> ref |
>>>>>> rows | Extra |
>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>> syslog.sc.src_port |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>> syslog.sc.dst_port |
>>>>>> 1 | |
>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>> --**-------+
>>>>>>
>>>>>> SHOW INDEX:
>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name |
>>>>>> Collation |
>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>> 1342691
>>>>>> | NULL | NULL | | BTREE | |
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>>
>>>>>>
>>>>>> Although right now the queries do seem to be executing much faster,
>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>
>>>>>> -Brandon
>>>>>>
>>>>>>
>>>>>> I am still having a big issue with my query as seen above. The table
>>>>> is up
>>>>> to around 32 million records at the moment and either of the two SELECT
>>>>> queries above take a very long time to run. Is there anything at all I
>>>>> can
>>>>> do to speed things up? It seems that changing the format of the WHERE
>>>>> clause did not help at all, as the EXPLAIN output is exactly the same
>>>>> for
>>>>> both version. I also tried adding an index on (open_dt, close_dt,
>>>>> rcvd) but
>>>>> that index does not get used.
>>>>>
>>>>> Any other ideas?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>>
>>>>> Brandon
>>>>>
>>>>> --
>>>>> MySQL General Mailing List
>>>>> For list archives: http://lists.mysql.com/mysql
>>>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>>>> unsub=eroomydna@gmail.com >>>>> unsub=eroomydna@gmail.com
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>> unsub=mmanolov@liquidation.com
>>>
>>>
>>
>>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?**
> unsub=eroomydna@gmail.com
>
>

--20cf303b4427238db704ac730a0b--

Re: Query Optimization

am 08.09.2011 21:45:36 von Mihail Manolov

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:

> Thanks for the idea Derek, however given the following query my EXPLAIN o=
utput is identical:
>=20
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM sonicwall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
> WHERE ('2011-09-07 13:18:58' <=3D open_dt <=3D '2011-09-08 13:18:58') OR =
('2011-09-07 13:18:58' <=3D close_dt <=3D '2011-09-08 13:18:58');
>=20
>=20
> +----+-------------+-------+--------+---------------+------- --+---------+=
--------------------+----------+-------------+
> | id | select_type | table | type | possible_keys | key | key_len |=
ref | rows | Extra |
> +----+-------------+-------+--------+---------------+------- --+---------+=
--------------------+----------+-------------+
> | 1 | SIMPLE | sc | ALL | NULL | NULL | NULL |=
NULL | 32393330 | Using where |
> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |=
syslog.sc.src_port | 1 | |
> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |=
syslog.sc.dst_port | 1 | |
> +----+-------------+-------+--------+---------------+------- --+---------+=
--------------------+----------+-------------+
>=20
> I did create indexes on open_dt and close_dt (2 separate indexes).
>=20
>=20
>=20
> On 09/08/2011 02:55 PM, Derek Downey wrote:
>> Correct me if I'm wrong. You're wanting to get all records that have an =
open_date or a close_date between two times.
>>=20
>> If that's correct, you might be able to get an index_merge by doing a qu=
ery like:
>>=20
>> WHERE ((starting time)<=3Dopen_dt<=3D (ending time)) OR ((starting time)=
<=3Dclose_dt<=3D(ending time))
>>=20
>> and creating two indexes (one on 'open_dt' and the other on 'close_dt')
>>=20
>> http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimizat ion.html
>>=20
>> Regards,
>> Derek
>>=20
>> On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
>>=20
>>> Andy,
>>>=20
>>> The queries take minutes to run. MySQL is 5.1.54 and it's running on U=
buntu server 11.04. Unfortunately the machine only has 2GB of RAM but no o=
ther major daemons are running on the machine. We are running RAID 1 (mirr=
oring) with 1TB drives. The tables in question here are all MyISAM. When =
running with the LIMIT 10 my EXPLAIN is:
>>>=20
>>> +----+-------------+-------+--------+---------------------+- -----------=
---------+---------+--------------------+----------+-------- ---------------=
------+
>>> | id | select_type | table | type | possible_keys | key =
| key_len | ref | rows | Extra =
|
>>> +----+-------------+-------+--------+---------------------+- -----------=
---------+---------+--------------------+----------+-------- ---------------=
------+
>>> | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | ndx_open_cl=
ose_rcvd | 8 | NULL | 32393316 | Using where; Using fil=
esort |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY =
| 2 | syslog.sc.src_port | 1 | =
|
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY =
| 2 | syslog.sc.dst_port | 1 | =
|
>>> +----+-------------+-------+--------+---------------------+- -----------=
---------+---------+--------------------+----------+-------- ---------------=
------+
>>>=20
>>> When I remove the LIMIT 10 I get:
>>>=20
>>> ----+-------------+-------+--------+---------------------+-- -------+---=
------+--------------------+----------+--------------------- --------+
>>> | id | select_type | table | type | possible_keys | key | k=
ey_len | ref | rows | Extra |
>>> +----+-------------+-------+--------+---------------------+- --------+--=
-------+--------------------+----------+-------------------- ---------+
>>> | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | N=
ULL | NULL | 32393330 | Using where; Using filesort |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2=
| syslog.sc.src_port | 1 | |
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2=
| syslog.sc.dst_port | 1 | |
>>> +----+-------------+-------+--------+---------------------+- --------+--=
-------+--------------------+----------+-------------------- ---------+
>>>=20
>>> Thanks for all your help thus far.
>>>=20
>>> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>>>> I don't think I saw any query timings in the emails (maybe I missed th=
em).
>>>>=20
>>>> What version of MySQL are you currently using?
>>>> What does the explain look like when your remove the limit 10?
>>>> Is your server tuned for MyISAM or InnoDB?
>>>> What kind of disk setup is in use?
>>>> How much memory is in your machine?
>>>>=20
>>>>=20
>>>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps wrot=
e:
>>>>=20
>>>>> Thanks for the reply Andy. Unfortunately the users will be selecting
>>>>> varying date ranges and new data is constantly coming in, so I am not=
sure
>>>>> how I could archive/cache the necessary data that would be any more
>>>>> efficient than simply using the database directly.
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>>=20
>>>>>> Thinking outside the query, is there any archiving that could happen=
to
>>>>>> make
>>>>>> your large tables kinder in the range scan?
>>>>>>=20
>>>>>> Andy
>>>>>>=20
>>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps w=
rote:
>>>>>>=20
>>>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>>=20
>>>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>>=20
>>>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>>=20
>>>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>>=20
>>>>>>>>>>> ...
>>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>> WHERE
>>>>>>>>>>>> (open_dt>=3D '2011-08-30 00:00:00' OR close_dt>=3D '2011-08-30
>>>>>>>>>>>>=20
>>>>>>>>>>> 00:00:00')
>>>>>>>>>>=20
>>>>>>>>>>> AND (open_dt<=3D '2011-08-30 12:36:53' OR close_dt<=3D '2011-08=
-30
>>>>>>>>>>>>=20
>>>>>>>>>>> 12:36:53')
>>>>>>>>>>=20
>>>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>>>> WHERE
>>>>>>>>>>> open_dt>=3D '2011-08-30 00:00:00'
>>>>>>>>>>> AND
>>>>>>>>>>> close_dt<=3D '2011-08-30 12:36:53'
>>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Now add an index over open_dt and close_dt and see what happens=
..
>>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Jochem
>>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Jochem,
>>>>>>>>>>=20
>>>>>>>>>> I can't really use your WHERE logic because I also need to retri=
eve
>>>>>>>>>> results where the open_dt time is out of the range specified. Fo=
r
>>>>>>>>>> example, a very large file download might span multiple days so =
given
>>>>>>>>>> your logic if the connection was started 2 days ago and I want t=
o pull
>>>>>>>>>> 1
>>>>>>>>>> days worth of connections, I would miss that entry. Basically I =
want
>>>>>>>>>> to
>>>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>>>> specified time period, ie. if any activity happened between my s=
tart
>>>>>>>>>> and
>>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>>=20
>>>>>>>>>> Any other ideas?
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>>>> reversed.
>>>>>>>>>=20
>>>>>>>>> Let's try a little ASCII art to show the situation. I will setup =
a
>>>>>>>>> query
>>>>>>>>> window with two markers (s) and (e). Events will be marked by |--=
--|
>>>>>>>>> markers
>>>>>>>>> showing their durations.
>>>>>>>>>=20
>>>>>>>>> a) (s) (e)
>>>>>>>>> b) |---|
>>>>>>>>> c) |---|
>>>>>>>>> d) |---|
>>>>>>>>> e) |--------------------|
>>>>>>>>> f) |---|
>>>>>>>>> g) |---|
>>>>>>>>>=20
>>>>>>>>> To describe these situations:
>>>>>>>>> a) is the window for which you want to query (s) is the starting =
time
>>>>>>>>> and
>>>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>>>> b) the event starts and stops before your window exists. It won't=
be
>>>>>>>>> part
>>>>>>>>> of your results.
>>>>>>>>> c) the event starts before the window but ends within the window =
-
>>>>>>>>> include this
>>>>>>>>> d) the event starts and ends within the window - include this
>>>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>>>> include
>>>>>>>>> this
>>>>>>>>> f) the event starts inside the window but ends beyond the window =
-
>>>>>>>>> include this.
>>>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>>>=20
>>>>>>>>> In order to get every event in the range of c-f, here is what you=
need
>>>>>>>>> for a WHERE clause
>>>>>>>>>=20
>>>>>>>>> WHERE start<=3D (ending time) and end>=3D (starting time)
>>>>>>>>>=20
>>>>>>>>> Try that and let us know the results.
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>> Thanks Jochem and Shawn, however the following two queries result =
in the
>>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early=
for
>>>>>>>> you)
>>>>>>>>=20
>>>>>>>> Old method:
>>>>>>>> SELECT
>>>>>>>> sc.open_dt,
>>>>>>>> sc.close_dt,
>>>>>>>> sc.protocol,
>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>> sc.src_port,
>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>> sc.dst_port,
>>>>>>>> sc.sent,
>>>>>>>> sc.rcvd,
>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>> FROM firewall_connections AS sc
>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>>>>>> WHERE
>>>>>>>> (open_dt>=3D '2011-08-31 09:53:31' OR close_dt>=3D '2011-08-31 09:=
53:31')
>>>>>>>> AND (open_dt<=3D '2011-09-01 09:53:31' OR close_dt<=3D '2011-09-01
>>>>>>>> 09:53:31')
>>>>>>>> ORDER BY rcvd DESC
>>>>>>>> LIMIT 0, 10;
>>>>>>>>=20
>>>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>>>> ndx_open_close_dt):
>>>>>>>> SELECT
>>>>>>>> sc.open_dt,
>>>>>>>> sc.close_dt,
>>>>>>>> sc.protocol,
>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>> sc.src_port,
>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>> sc.dst_port,
>>>>>>>> sc.sent,
>>>>>>>> sc.rcvd,
>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>> FROM firewall_connections AS sc
>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>>>>>> WHERE
>>>>>>>> open_dt<=3D '2011-09-01 09:53:31' AND close_dt>=3D '2011-08-31 09:=
53:31'
>>>>>>>> ORDER BY rcvd DESC
>>>>>>>> LIMIT 0, 10;
>>>>>>>>=20
>>>>>>>> EXPLAIN output for old method:
>>>>>>>> +----+-------------+-------+--****------+------------------- --**--=
**
>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>> --**-------+
>>>>>>>> | id | select_type | table | type | possible_keys | key | key_len =
| ref
>>>>>>>> |
>>>>>>>> rows | Extra |
>>>>>>>> +----+-------------+-------+--****------+------------------- --**--=
**
>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>> --**-------+
>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd |=
4 |
>>>>>>>> NULL | 10 | Using where |
>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.sr=
c_port
>>>>>>>> |
>>>>>>>> 1 | |
>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.ds=
t_port
>>>>>>>> |
>>>>>>>> 1 | |
>>>>>>>> +----+-------------+-------+--****------+------------------- --**--=
**
>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>> --**-------+
>>>>>>>>=20
>>>>>>>> EXPLAIN output for new method with new index:
>>>>>>>> +----+-------------+-------+--****------+------------------- --**--=
**
>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>> --**-------+
>>>>>>>> | id | select_type | table | type | possible_keys | key | key_len =
| ref
>>>>>>>> |
>>>>>>>> rows | Extra |
>>>>>>>> +----+-------------+-------+--****------+------------------- --**--=
**
>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>> --**-------+
>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd |=
4 |
>>>>>>>> NULL | 10 | Using where |
>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.sr=
c_port
>>>>>>>> |
>>>>>>>> 1 | |
>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.ds=
t_port
>>>>>>>> |
>>>>>>>> 1 | |
>>>>>>>> +----+-------------+-------+--****------+------------------- --**--=
**
>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>> --**-------+
>>>>>>>>=20
>>>>>>>> SHOW INDEX:
>>>>>>>> +----------------------+------****------+------------------- +-**--=
**
>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--=
**
>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Col=
lation
>>>>>>>> |
>>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>>> +----------------------+------****------+------------------- +-**--=
**
>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--=
**
>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>>>> 1342691
>>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A =
|
>>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>>> +----------------------+------****------+------------------- +-**--=
**
>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--=
**
>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Although right now the queries do seem to be executing much faster=
,
>>>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>>=20
>>>>>>>> -Brandon
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> I am still having a big issue with my query as seen above. The t=
able
>>>>>>> is up
>>>>>>> to around 32 million records at the moment and either of the two SE=
LECT
>>>>>>> queries above take a very long time to run. Is there anything at a=
ll I
>>>>>>> can
>>>>>>> do to speed things up? It seems that changing the format of the WH=
ERE
>>>>>>> clause did not help at all, as the EXPLAIN output is exactly the sa=
me for
>>>>>>> both version. I also tried adding an index on (open_dt, close_dt, =
rcvd)
>>>>>>> but
>>>>>>> that index does not get used.
>>>>>>>=20
>>>>>>> Any other ideas?
>>>>>>>=20
>>>>>>> Thanks in advance,
>>>>>>>=20
>>>>>>>=20
>>>>>>> Brandon
>>>>>>>=20
>>>>>>> --
>>>>>>> MySQL General Mailing List
>>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>>> To unsubscribe: http://lists.mysql.com/mysql?**** ysql.com/mysql?**>
>>>>>>> unsub=3Deroomydna@gmail.com >>>>>>> unsub=3Deroomydna@gmail.com oomydna@gmail.com>
>>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>=20
>>>>> --
>>>>> MySQL General Mailing List
>>>>> For list archives: http://lists.mysql.com/mysql
>>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>>> unsub=3Deroomydna@gmail.com mydna@gmail.com>
>>>>>=20
>>>>>=20
>>>>=20
>>>=20
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dderek@orange-pa=
nts.com
>>>=20
>>=20
>>=20
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmmanolov@liquidat=
ion.com
>=20


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Query Optimization

am 08.09.2011 22:43:20 von Brandon Phelps

Andrew,

Generally there is only 1 user performing the complicated SELECT query at a time, however the background process that fills the table is constantly doing a fast SELECT (0.00003 seconds) and a subsequent UPDATE. Basically whenever a connection is closed on the firewall, the bg process SELECTS from the table the last record that was opened (and hasn't already been closed) for a given source IP, source port, dest ip, and protocol. So for example whenever the firewall logs a "CONNECTION CLOSED" message, the bg process does:

select id from sonicwall_connections where src_address = w.x.y.z and src_port = x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1

then it UPDATES the close_dt column for the record with the selected ID. These select/update statements being run by the background process generally take 0.000x seconds each.

The only slowdown we see is with the complicated SELECT I have below, and this is only executed once in a while throughout the day. For example, if an IT manager sees a huge bandwidth spike on the network, he may access the web gui to determine what the source IP is so he can determine who is downloading a large file, etc.

I think what I will try to do is create 60 partitions for the table in question based on month and day. This way each day will go into it's own partition and if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it will only need to access 7 partitions instead of the entire table.

My only question with this is how would I go about creating the table with 60 partitions in such a way that I won't need to constantly add/remove new/old partitions every day? Is there any partitioning function I can use that would not require me to update the partitions schema every day? I already plan to have a cron run each night to purge records older than 60 days from the database.

On 09/08/2011 03:26 PM, Andrew Moore wrote:
> Partitioning isn't a bad idea for this however I'm still thinking about your
> dataset size and possible hardware limitations. It's not likely going to fit
> into relevant buffers/memory so you're going to be on disk more then you
> want. You're probably creating temporary tables like crazy and I would bet
> that there are a good load of them heading to disk too. With your IO
> performance limited to a small amount of disks as you describe, you're not
> going to be able to get much more from these queries. Although a dedicated
> DB server are there other databases been accessed on the server? When
> looking at the scope of your data, are you capturing more then you need? How
> often and how far back are the users querying? How many users concurrently
> performing queries on the 32m record table?
>
> On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps wrote:
>
>> Mihail,
>>
>> I have considered this but have not yet determined how best to go about
>> partitioning the table. I don't think partitioning by dst_address or
>> src_address would help because most of the queries do not filter on IP
>> address (except very specific queries where the end-user is searching the
>> table for history on a particular employee).
>>
>> I could potentially partition the table based on the day of week the
>> connection was opened on which may improve performance for a while since
>> this would take me from a single 32million record table down to roughly 4.5
>> million records per partition (32 / 7) however we are looking to eventually
>> store up to 2 months worth of data in the table, and so far that 32 million
>> records is only for 1 month, so I estimate another 32 million-ish before the
>> month is out, bringing me to roughly 70 million records total (it would be
>> nice if I could store even more than 2 months, but given my currently
>> performance dilemma I don't expect that to happen). Also this does not take
>> into account that the end-user will often be pulling data for multiple days
>> at a time, meaning that multiple partitions in this scheme will need to be
>> accessed anyway.
>>
>> The only other logical partitioning scheme I can think of would be to
>> partition based on dst_port (the port the log relates to) but the majority
>> of records are all to port 80 (standard web traffic) so I don't think this
>> would be all that helpful.
>>
>> I have never messed with partitioning so it is totally possible that I am
>> not thinking of something, so if you have any ideas on a decent partitioning
>> scheme based on my criteria and queries below, please let me know.
>>
>> Thanks,
>> Brandon
>>
>>
>> On 09/08/2011 02:47 PM, Mihail Manolov wrote:
>>
>>> If you're running version 5.1+ you may wanna take a look at table
>>> partitioning options you may have.
>>>
>>> On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:
>>>
>>> Thanks for the reply Andy. Unfortunately the users will be selecting
>>>> varying date ranges and new data is constantly coming in, so I am not sure
>>>> how I could archive/cache the necessary data that would be any more
>>>> efficient than simply using the database directly.
>>>>
>>>>
>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>
>>>>> Thinking outside the query, is there any archiving that could happen to
>>>>> make
>>>>> your large tables kinder in the range scan?
>>>>>
>>>>> Andy
>>>>>
>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps
>>>>> wrote:
>>>>>
>>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>
>>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>
>>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>
>>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> WHERE
>>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>>>
>>>>>>>>>> 00:00:00')
>>>>>>>>>
>>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>>>
>>>>>>>>>> 12:36:53')
>>>>>>>>>
>>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>>> WHERE
>>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>>> AND
>>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jochem
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jochem,
>>>>>>>>>
>>>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>>>> example, a very large file download might span multiple days so
>>>>>>>>> given
>>>>>>>>> your logic if the connection was started 2 days ago and I want to
>>>>>>>>> pull 1
>>>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>>>> to
>>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>>>> and
>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>
>>>>>>>>> Any other ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>>> reversed.
>>>>>>>>
>>>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>>>> query
>>>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>>>> markers
>>>>>>>> showing their durations.
>>>>>>>>
>>>>>>>> a) (s) (e)
>>>>>>>> b) |---|
>>>>>>>> c) |---|
>>>>>>>> d) |---|
>>>>>>>> e) |--------------------|
>>>>>>>> f) |---|
>>>>>>>> g) |---|
>>>>>>>>
>>>>>>>> To describe these situations:
>>>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>>>> and
>>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>>>> part
>>>>>>>> of your results.
>>>>>>>> c) the event starts before the window but ends within the window -
>>>>>>>> include this
>>>>>>>> d) the event starts and ends within the window - include this
>>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>>> include
>>>>>>>> this
>>>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>>>> include this.
>>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>>
>>>>>>>> In order to get every event in the range of c-f, here is what you
>>>>>>>> need
>>>>>>>> for a WHERE clause
>>>>>>>>
>>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>>>
>>>>>>>> Try that and let us know the results.
>>>>>>>>
>>>>>>>>
>>>>>>> Thanks Jochem and Shawn, however the following two queries result in
>>>>>>> the
>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>>>> you)
>>>>>>>
>>>>>>> Old method:
>>>>>>> SELECT
>>>>>>> sc.open_dt,
>>>>>>> sc.close_dt,
>>>>>>> sc.protocol,
>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>> sc.src_port,
>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>> sc.dst_port,
>>>>>>> sc.sent,
>>>>>>> sc.rcvd,
>>>>>>> spm.desc AS src_port_desc,
>>>>>>> dpm.desc AS dst_port_desc
>>>>>>> FROM firewall_connections AS sc
>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>> WHERE
>>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>>>> 09:53:31')
>>>>>>> ORDER BY rcvd DESC
>>>>>>> LIMIT 0, 10;
>>>>>>>
>>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>>> ndx_open_close_dt):
>>>>>>> SELECT
>>>>>>> sc.open_dt,
>>>>>>> sc.close_dt,
>>>>>>> sc.protocol,
>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>> sc.src_port,
>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>> sc.dst_port,
>>>>>>> sc.sent,
>>>>>>> sc.rcvd,
>>>>>>> spm.desc AS src_port_desc,
>>>>>>> dpm.desc AS dst_port_desc
>>>>>>> FROM firewall_connections AS sc
>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>> WHERE
>>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>>>> ORDER BY rcvd DESC
>>>>>>> LIMIT 0, 10;
>>>>>>>
>>>>>>> EXPLAIN output for old method:
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | id | select_type | table | type | possible_keys | key | key_len |
>>>>>>> ref |
>>>>>>> rows | Extra |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>> NULL | 10 | Using where |
>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>>> syslog.sc.src_port |
>>>>>>> 1 | |
>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>>> syslog.sc.dst_port |
>>>>>>> 1 | |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>>
>>>>>>> EXPLAIN output for new method with new index:
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | id | select_type | table | type | possible_keys | key | key_len |
>>>>>>> ref |
>>>>>>> rows | Extra |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>> NULL | 10 | Using where |
>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>>> syslog.sc.src_port |
>>>>>>> 1 | |
>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
>>>>>>> syslog.sc.dst_port |
>>>>>>> 1 | |
>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>> --**-------+
>>>>>>>
>>>>>>> SHOW INDEX:
>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name |
>>>>>>> Collation |
>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>>> 1342691
>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>
>>>>>>>
>>>>>>> Although right now the queries do seem to be executing much faster,
>>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>
>>>>>>> -Brandon
>>>>>>>
>>>>>>>
>>>>>>> I am still having a big issue with my query as seen above. The table
>>>>>> is up
>>>>>> to around 32 million records at the moment and either of the two SELECT
>>>>>> queries above take a very long time to run. Is there anything at all I
>>>>>> can
>>>>>> do to speed things up? It seems that changing the format of the WHERE
>>>>>> clause did not help at all, as the EXPLAIN output is exactly the same
>>>>>> for
>>>>>> both version. I also tried adding an index on (open_dt, close_dt,
>>>>>> rcvd) but
>>>>>> that index does not get used.
>>>>>>
>>>>>> Any other ideas?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>>
>>>>>> Brandon
>>>>>>
>>>>>> --
>>>>>> MySQL General Mailing List
>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>>>>> unsub=eroomydna@gmail.com >>>>>> unsub=eroomydna@gmail.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>> unsub=mmanolov@liquidation.com
>>>>
>>>>
>>>
>>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?**
>> unsub=eroomydna@gmail.com
>>
>>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 22:50:51 von Brandon Phelps

Mihail,

Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

+----+--------------+------------+--------+----------------- ---------------------------+--------------+---------+------- -------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+--------+----------------- ---------------------------+--------------+---------+------- -------------+------+----------------+
| 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where |
| 1 | PRIMARY | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | PRIMARY | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
| 2 | UNION | sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where |
| 2 | UNION | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 2 | UNION | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
| NULL | UNION RESULT | | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
+----+--------------+------------+--------+----------------- ---------------------------+--------------+---------+------- -------------+------+----------------+



On 09/08/2011 03:45 PM, Mihail Manolov wrote:
> How about:
>
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM sonicwall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
> WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
>
> UNION
>
> SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM sonicwall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
> WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
>
>
>
> On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
>
>> Thanks for the idea Derek, however given the following query my EXPLAIN output is identical:
>>
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM sonicwall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE ('2011-09-07 13:18:58'<= open_dt<= '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58'<= close_dt<= '2011-09-08 13:18:58');
>>
>>
>> +----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
>> | 1 | SIMPLE | sc | ALL | NULL | NULL | NULL | NULL | 32393330 | Using where |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>> +----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
>>
>> I did create indexes on open_dt and close_dt (2 separate indexes).
>>
>>
>>
>> On 09/08/2011 02:55 PM, Derek Downey wrote:
>>> Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times.
>>>
>>> If that's correct, you might be able to get an index_merge by doing a query like:
>>>
>>> WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting time)<=close_dt<=(ending time))
>>>
>>> and creating two indexes (one on 'open_dt' and the other on 'close_dt')
>>>
>>> http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimizat ion.html
>>>
>>> Regards,
>>> Derek
>>>
>>> On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
>>>
>>>> Andy,
>>>>
>>>> The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is:
>>>>
>>>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>>>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>>> | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>>>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>>>
>>>> When I remove the LIMIT 10 I get:
>>>>
>>>> ----+-------------+-------+--------+---------------------+-- -------+---------+--------------------+----------+---------- -------------------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>>>> +----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
>>>> | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | NULL | NULL | 32393330 | Using where; Using filesort |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>>>> +----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
>>>>
>>>> Thanks for all your help thus far.
>>>>
>>>> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>>>>> I don't think I saw any query timings in the emails (maybe I missed them).
>>>>>
>>>>> What version of MySQL are you currently using?
>>>>> What does the explain look like when your remove the limit 10?
>>>>> Is your server tuned for MyISAM or InnoDB?
>>>>> What kind of disk setup is in use?
>>>>> How much memory is in your machine?
>>>>>
>>>>>
>>>>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps wrote:
>>>>>
>>>>>> Thanks for the reply Andy. Unfortunately the users will be selecting
>>>>>> varying date ranges and new data is constantly coming in, so I am not sure
>>>>>> how I could archive/cache the necessary data that would be any more
>>>>>> efficient than simply using the database directly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>>>
>>>>>>> Thinking outside the query, is there any archiving that could happen to
>>>>>>> make
>>>>>>> your large tables kinder in the range scan?
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>>>>>>
>>>>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>>>
>>>>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>>>
>>>>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>>>
>>>>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> WHERE
>>>>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>>>>>
>>>>>>>>>>>> 00:00:00')
>>>>>>>>>>>
>>>>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>>>>>
>>>>>>>>>>>> 12:36:53')
>>>>>>>>>>>
>>>>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>>>>> WHERE
>>>>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>>>>> AND
>>>>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Jochem
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Jochem,
>>>>>>>>>>>
>>>>>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>>>>>> example, a very large file download might span multiple days so given
>>>>>>>>>>> your logic if the connection was started 2 days ago and I want to pull
>>>>>>>>>>> 1
>>>>>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>>>>>> to
>>>>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>>>>>> and
>>>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>>>
>>>>>>>>>>> Any other ideas?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>>>>> reversed.
>>>>>>>>>>
>>>>>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>>>>>> query
>>>>>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>>>>>> markers
>>>>>>>>>> showing their durations.
>>>>>>>>>>
>>>>>>>>>> a) (s) (e)
>>>>>>>>>> b) |---|
>>>>>>>>>> c) |---|
>>>>>>>>>> d) |---|
>>>>>>>>>> e) |--------------------|
>>>>>>>>>> f) |---|
>>>>>>>>>> g) |---|
>>>>>>>>>>
>>>>>>>>>> To describe these situations:
>>>>>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>>>>>> and
>>>>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>>>>>> part
>>>>>>>>>> of your results.
>>>>>>>>>> c) the event starts before the window but ends within the window -
>>>>>>>>>> include this
>>>>>>>>>> d) the event starts and ends within the window - include this
>>>>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>>>>> include
>>>>>>>>>> this
>>>>>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>>>>>> include this.
>>>>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>>>>
>>>>>>>>>> In order to get every event in the range of c-f, here is what you need
>>>>>>>>>> for a WHERE clause
>>>>>>>>>>
>>>>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>>>>>
>>>>>>>>>> Try that and let us know the results.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>>>>>> you)
>>>>>>>>>
>>>>>>>>> Old method:
>>>>>>>>> SELECT
>>>>>>>>> sc.open_dt,
>>>>>>>>> sc.close_dt,
>>>>>>>>> sc.protocol,
>>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>>> sc.src_port,
>>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>>> sc.dst_port,
>>>>>>>>> sc.sent,
>>>>>>>>> sc.rcvd,
>>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>>> FROM firewall_connections AS sc
>>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>>>> WHERE
>>>>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>>>>>> 09:53:31')
>>>>>>>>> ORDER BY rcvd DESC
>>>>>>>>> LIMIT 0, 10;
>>>>>>>>>
>>>>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>>>>> ndx_open_close_dt):
>>>>>>>>> SELECT
>>>>>>>>> sc.open_dt,
>>>>>>>>> sc.close_dt,
>>>>>>>>> sc.protocol,
>>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>>> sc.src_port,
>>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>>> sc.dst_port,
>>>>>>>>> sc.sent,
>>>>>>>>> sc.rcvd,
>>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>>> FROM firewall_connections AS sc
>>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>>>> WHERE
>>>>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>>>>>> ORDER BY rcvd DESC
>>>>>>>>> LIMIT 0, 10;
>>>>>>>>>
>>>>>>>>> EXPLAIN output for old method:
>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>> --**-------+
>>>>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>>>>> |
>>>>>>>>> rows | Extra |
>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>> --**-------+
>>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>>>> NULL | 10 | Using where |
>>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>>>>> |
>>>>>>>>> 1 | |
>>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>>>>> |
>>>>>>>>> 1 | |
>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>> --**-------+
>>>>>>>>>
>>>>>>>>> EXPLAIN output for new method with new index:
>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>> --**-------+
>>>>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>>>>> |
>>>>>>>>> rows | Extra |
>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>> --**-------+
>>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>>>> NULL | 10 | Using where |
>>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>>>>> |
>>>>>>>>> 1 | |
>>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>>>>> |
>>>>>>>>> 1 | |
>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>> --**-------+
>>>>>>>>>
>>>>>>>>> SHOW INDEX:
>>>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
>>>>>>>>> |
>>>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>>>>> 1342691
>>>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Although right now the queries do seem to be executing much faster,
>>>>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>>>
>>>>>>>>> -Brandon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am still having a big issue with my query as seen above. The table
>>>>>>>> is up
>>>>>>>> to around 32 million records at the moment and either of the two SELECT
>>>>>>>> queries above take a very long time to run. Is there anything at all I
>>>>>>>> can
>>>>>>>> do to speed things up? It seems that changing the format of the WHERE
>>>>>>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>>>>>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd)
>>>>>>>> but
>>>>>>>> that index does not get used.
>>>>>>>>
>>>>>>>> Any other ideas?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>>
>>>>>>>> Brandon
>>>>>>>>
>>>>>>>> --
>>>>>>>> MySQL General Mailing List
>>>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>>>>>>> unsub=eroomydna@gmail.com >>>>>>>> unsub=eroomydna@gmail.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> MySQL General Mailing List
>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>>>> unsub=eroomydna@gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=derek@orange-pants.com
>>>>
>>>
>>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=mmanolov@liquidation.com
>>
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Query Optimization

am 08.09.2011 23:21:32 von Mihail Manolov

From the manual: "The default behavior for UNION is that duplicate rows are=
removed from the result."

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:

> Mihail,
>=20
> Thanks so much! I modified your example to include the proper ORDER BY a=
nd LIMIT clauses and this, so far, is running super fast (0.0007 seconds). =
Question, if a record's open_dt is between the range AND the close_dt is b=
etween the range as well, will the UNION output the record twice? If so, i=
s there any way to prevent that?
>=20
> (SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM sonicwall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
> WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
> UNION
> (SELECT
> sc.open_dt,
> sc.close_dt,
> sc.protocol,
> INET_NTOA(sc.src_address) AS src_address,
> sc.src_port,
> INET_NTOA(sc.dst_address) AS dst_address,
> sc.dst_port,
> sc.sent,
> sc.rcvd,
> spm.desc AS src_port_desc,
> dpm.desc AS dst_port_desc
> FROM sonicwall_connections AS sc
> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
> WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
> ORDER BY rcvd DESC LIMIT 10;
>=20
> +----+--------------+------------+--------+----------------- -------------=
--------------+--------------+---------+-------------------- +------+-------=
---------+
> | id | select_type | table | type | possible_keys =
| key | key_len | ref | rows | Extra =
|
> +----+--------------+------------+--------+----------------- -------------=
--------------+--------------+---------+-------------------- +------+-------=
---------+
> | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,n=
dx_open_close | open_dt | 8 | NULL | 1057 | Using =
where |
> | 1 | PRIMARY | spm | eq_ref | PRIMARY =
| PRIMARY | 2 | syslog.sc.src_port | 1 | =
|
> | 1 | PRIMARY | dpm | eq_ref | PRIMARY =
| PRIMARY | 2 | syslog.sc.dst_port | 1 | =
|
> | 2 | UNION | sc | range | ndx_close_dt =
| ndx_close_dt | 8 | NULL | 1131 | Using =
where |
> | 2 | UNION | spm | eq_ref | PRIMARY =
| PRIMARY | 2 | syslog.sc.src_port | 1 | =
|
> | 2 | UNION | dpm | eq_ref | PRIMARY =
| PRIMARY | 2 | syslog.sc.dst_port | 1 | =
|
> | NULL | UNION RESULT | | ALL | NULL =
| NULL | NULL | NULL | NULL | Usin=
g filesort |
> +----+--------------+------------+--------+----------------- -------------=
--------------+--------------+---------+-------------------- +------+-------=
---------+
>=20
>=20
>=20
> On 09/08/2011 03:45 PM, Mihail Manolov wrote:
>> How about:
>>=20
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM sonicwall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>> WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
>>=20
>> UNION
>>=20
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM sonicwall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>> WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
>>=20
>>=20
>>=20
>> On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
>>=20
>>> Thanks for the idea Derek, however given the following query my EXPLAIN=
output is identical:
>>>=20
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM sonicwall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>> WHERE ('2011-09-07 13:18:58'<=3D open_dt<=3D '2011-09-08 13:18:58') OR =
('2011-09-07 13:18:58'<=3D close_dt<=3D '2011-09-08 13:18:58');
>>>=20
>>>=20
>>> +----+-------------+-------+--------+---------------+------- --+--------=
-+--------------------+----------+-------------+
>>> | id | select_type | table | type | possible_keys | key | key_len=
| ref | rows | Extra |
>>> +----+-------------+-------+--------+---------------+------- --+--------=
-+--------------------+----------+-------------+
>>> | 1 | SIMPLE | sc | ALL | NULL | NULL | NULL =
| NULL | 32393330 | Using where |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 =
| syslog.sc.src_port | 1 | |
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 =
| syslog.sc.dst_port | 1 | |
>>> +----+-------------+-------+--------+---------------+------- --+--------=
-+--------------------+----------+-------------+
>>>=20
>>> I did create indexes on open_dt and close_dt (2 separate indexes).
>>>=20
>>>=20
>>>=20
>>> On 09/08/2011 02:55 PM, Derek Downey wrote:
>>>> Correct me if I'm wrong. You're wanting to get all records that have a=
n open_date or a close_date between two times.
>>>>=20
>>>> If that's correct, you might be able to get an index_merge by doing a =
query like:
>>>>=20
>>>> WHERE ((starting time)<=3Dopen_dt<=3D (ending time)) OR ((starting tim=
e)<=3Dclose_dt<=3D(ending time))
>>>>=20
>>>> and creating two indexes (one on 'open_dt' and the other on 'close_dt'=
)
>>>>=20
>>>> http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimizat ion.html
>>>>=20
>>>> Regards,
>>>> Derek
>>>>=20
>>>> On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
>>>>=20
>>>>> Andy,
>>>>>=20
>>>>> The queries take minutes to run. MySQL is 5.1.54 and it's running on=
Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no=
other major daemons are running on the machine. We are running RAID 1 (mi=
rroring) with 1TB drives. The tables in question here are all MyISAM. Whe=
n running with the LIMIT 10 my EXPLAIN is:
>>>>>=20
>>>>> +----+-------------+-------+--------+---------------------+- ---------=
-----------+---------+--------------------+----------+------ ---------------=
--------+
>>>>> | id | select_type | table | type | possible_keys | key =
| key_len | ref | rows | Extra =
|
>>>>> +----+-------------+-------+--------+---------------------+- ---------=
-----------+---------+--------------------+----------+------ ---------------=
--------+
>>>>> | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | ndx_open_=
close_rcvd | 8 | NULL | 32393316 | Using where; Using f=
ilesort |
>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY =
| 2 | syslog.sc.src_port | 1 | =
|
>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY =
| 2 | syslog.sc.dst_port | 1 | =
|
>>>>> +----+-------------+-------+--------+---------------------+- ---------=
-----------+---------+--------------------+----------+------ ---------------=
--------+
>>>>>=20
>>>>> When I remove the LIMIT 10 I get:
>>>>>=20
>>>>> ----+-------------+-------+--------+---------------------+-- -------+-=
--------+--------------------+----------+------------------- ----------+
>>>>> | id | select_type | table | type | possible_keys | key |=
key_len | ref | rows | Extra |
>>>>> +----+-------------+-------+--------+---------------------+- --------+=
---------+--------------------+----------+------------------ -----------+
>>>>> | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL |=
NULL | NULL | 32393330 | Using where; Using filesort |
>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY |=
2 | syslog.sc.src_port | 1 | |
>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY |=
2 | syslog.sc.dst_port | 1 | |
>>>>> +----+-------------+-------+--------+---------------------+- --------+=
---------+--------------------+----------+------------------ -----------+
>>>>>=20
>>>>> Thanks for all your help thus far.
>>>>>=20
>>>>> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>>>>>> I don't think I saw any query timings in the emails (maybe I missed =
them).
>>>>>>=20
>>>>>> What version of MySQL are you currently using?
>>>>>> What does the explain look like when your remove the limit 10?
>>>>>> Is your server tuned for MyISAM or InnoDB?
>>>>>> What kind of disk setup is in use?
>>>>>> How much memory is in your machine?
>>>>>>=20
>>>>>>=20
>>>>>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps w=
rote:
>>>>>>=20
>>>>>>> Thanks for the reply Andy. Unfortunately the users will be selecti=
ng
>>>>>>> varying date ranges and new data is constantly coming in, so I am n=
ot sure
>>>>>>> how I could archive/cache the necessary data that would be any more
>>>>>>> efficient than simply using the database directly.
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>>>>=20
>>>>>>>> Thinking outside the query, is there any archiving that could happ=
en to
>>>>>>>> make
>>>>>>>> your large tables kinder in the range scan?
>>>>>>>>=20
>>>>>>>> Andy
>>>>>>>>=20
>>>>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps =
wrote:
>>>>>>>>=20
>>>>>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>>>>=20
>>>>>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>>>>=20
>>>>>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>>>>=20
>>>>>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>>>>=20
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>=20
>>>>>>>>>>>> WHERE
>>>>>>>>>>>>>> (open_dt>=3D '2011-08-30 00:00:00' OR close_dt>=3D '2011-08-=
30
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>> 00:00:00')
>>>>>>>>>>>>=20
>>>>>>>>>>>>> AND (open_dt<=3D '2011-08-30 12:36:53' OR close_dt<=3D '2011-=
08-30
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>> 12:36:53')
>>>>>>>>>>>>=20
>>>>>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>>>>>> WHERE
>>>>>>>>>>>>> open_dt>=3D '2011-08-30 00:00:00'
>>>>>>>>>>>>> AND
>>>>>>>>>>>>> close_dt<=3D '2011-08-30 12:36:53'
>>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>> Now add an index over open_dt and close_dt and see what happe=
ns.
>>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>> Jochem
>>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>> Jochem,
>>>>>>>>>>>>=20
>>>>>>>>>>>> I can't really use your WHERE logic because I also need to ret=
rieve
>>>>>>>>>>>> results where the open_dt time is out of the range specified. =
For
>>>>>>>>>>>> example, a very large file download might span multiple days s=
o given
>>>>>>>>>>>> your logic if the connection was started 2 days ago and I want=
to pull
>>>>>>>>>>>> 1
>>>>>>>>>>>> days worth of connections, I would miss that entry. Basically =
I want
>>>>>>>>>>>> to
>>>>>>>>>>>> SELECT all of the records that were opened OR closed during th=
e
>>>>>>>>>>>> specified time period, ie. if any activity happened between my=
start
>>>>>>>>>>>> and
>>>>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>>>>=20
>>>>>>>>>>>> Any other ideas?
>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>>>>>> reversed.
>>>>>>>>>>>=20
>>>>>>>>>>> Let's try a little ASCII art to show the situation. I will setu=
p a
>>>>>>>>>>> query
>>>>>>>>>>> window with two markers (s) and (e). Events will be marked by |=
----|
>>>>>>>>>>> markers
>>>>>>>>>>> showing their durations.
>>>>>>>>>>>=20
>>>>>>>>>>> a) (s) (e)
>>>>>>>>>>> b) |---|
>>>>>>>>>>> c) |---|
>>>>>>>>>>> d) |---|
>>>>>>>>>>> e) |--------------------|
>>>>>>>>>>> f) |---|
>>>>>>>>>>> g) |---|
>>>>>>>>>>>=20
>>>>>>>>>>> To describe these situations:
>>>>>>>>>>> a) is the window for which you want to query (s) is the startin=
g time
>>>>>>>>>>> and
>>>>>>>>>>> (e) is the ending time for the date range you are interested in=
..
>>>>>>>>>>> b) the event starts and stops before your window exists. It won=
't be
>>>>>>>>>>> part
>>>>>>>>>>> of your results.
>>>>>>>>>>> c) the event starts before the window but ends within the windo=
w -
>>>>>>>>>>> include this
>>>>>>>>>>> d) the event starts and ends within the window - include this
>>>>>>>>>>> e) the event starts before the window and ends after the window=
-
>>>>>>>>>>> include
>>>>>>>>>>> this
>>>>>>>>>>> f) the event starts inside the window but ends beyond the windo=
w -
>>>>>>>>>>> include this.
>>>>>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>>>>>=20
>>>>>>>>>>> In order to get every event in the range of c-f, here is what y=
ou need
>>>>>>>>>>> for a WHERE clause
>>>>>>>>>>>=20
>>>>>>>>>>> WHERE start<=3D (ending time) and end>=3D (starting time)
>>>>>>>>>>>=20
>>>>>>>>>>> Try that and let us know the results.
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>> Thanks Jochem and Shawn, however the following two queries resul=
t in the
>>>>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too ear=
ly for
>>>>>>>>>> you)
>>>>>>>>>>=20
>>>>>>>>>> Old method:
>>>>>>>>>> SELECT
>>>>>>>>>> sc.open_dt,
>>>>>>>>>> sc.close_dt,
>>>>>>>>>> sc.protocol,
>>>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>>>> sc.src_port,
>>>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>>>> sc.dst_port,
>>>>>>>>>> sc.sent,
>>>>>>>>>> sc.rcvd,
>>>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>>>> FROM firewall_connections AS sc
>>>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>>>>>>>> WHERE
>>>>>>>>>> (open_dt>=3D '2011-08-31 09:53:31' OR close_dt>=3D '2011-08-31 0=
9:53:31')
>>>>>>>>>> AND (open_dt<=3D '2011-09-01 09:53:31' OR close_dt<=3D '2011-09-=
01
>>>>>>>>>> 09:53:31')
>>>>>>>>>> ORDER BY rcvd DESC
>>>>>>>>>> LIMIT 0, 10;
>>>>>>>>>>=20
>>>>>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>>>>>> ndx_open_close_dt):
>>>>>>>>>> SELECT
>>>>>>>>>> sc.open_dt,
>>>>>>>>>> sc.close_dt,
>>>>>>>>>> sc.protocol,
>>>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>>>> sc.src_port,
>>>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>>>> sc.dst_port,
>>>>>>>>>> sc.sent,
>>>>>>>>>> sc.rcvd,
>>>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>>>> FROM firewall_connections AS sc
>>>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port
>>>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port
>>>>>>>>>> WHERE
>>>>>>>>>> open_dt<=3D '2011-09-01 09:53:31' AND close_dt>=3D '2011-08-31 0=
9:53:31'
>>>>>>>>>> ORDER BY rcvd DESC
>>>>>>>>>> LIMIT 0, 10;
>>>>>>>>>>=20
>>>>>>>>>> EXPLAIN output for old method:
>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**=
--**
>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>> --**-------+
>>>>>>>>>> | id | select_type | table | type | possible_keys | key | key_le=
n | ref
>>>>>>>>>> |
>>>>>>>>>> rows | Extra |
>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**=
--**
>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>> --**-------+
>>>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd=
| 4 |
>>>>>>>>>> NULL | 10 | Using where |
>>>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.=
src_port
>>>>>>>>>> |
>>>>>>>>>> 1 | |
>>>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.=
dst_port
>>>>>>>>>> |
>>>>>>>>>> 1 | |
>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**=
--**
>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>> --**-------+
>>>>>>>>>>=20
>>>>>>>>>> EXPLAIN output for new method with new index:
>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**=
--**
>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>> --**-------+
>>>>>>>>>> | id | select_type | table | type | possible_keys | key | key_le=
n | ref
>>>>>>>>>> |
>>>>>>>>>> rows | Extra |
>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**=
--**
>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>> --**-------+
>>>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd=
| 4 |
>>>>>>>>>> NULL | 10 | Using where |
>>>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.=
src_port
>>>>>>>>>> |
>>>>>>>>>> 1 | |
>>>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.=
dst_port
>>>>>>>>>> |
>>>>>>>>>> 1 | |
>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**=
--**
>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>> --**-------+
>>>>>>>>>>=20
>>>>>>>>>> SHOW INDEX:
>>>>>>>>>> +----------------------+------****------+------------------- +-**=
--**
>>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**=
--**
>>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | C=
ollation
>>>>>>>>>> |
>>>>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>>>>> +----------------------+------****------+------------------- +-**=
--**
>>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**=
--**
>>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A=
|
>>>>>>>>>> 1342691
>>>>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | =
A |
>>>>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>>>>> +----------------------+------****------+------------------- +-**=
--**
>>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**=
--**
>>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Although right now the queries do seem to be executing much fast=
er,
>>>>>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>>>>=20
>>>>>>>>>> -Brandon
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> I am still having a big issue with my query as seen above. The=
table
>>>>>>>>> is up
>>>>>>>>> to around 32 million records at the moment and either of the two =
SELECT
>>>>>>>>> queries above take a very long time to run. Is there anything at=
all I
>>>>>>>>> can
>>>>>>>>> do to speed things up? It seems that changing the format of the =
WHERE
>>>>>>>>> clause did not help at all, as the EXPLAIN output is exactly the =
same for
>>>>>>>>> both version. I also tried adding an index on (open_dt, close_dt=
, rcvd)
>>>>>>>>> but
>>>>>>>>> that index does not get used.
>>>>>>>>>=20
>>>>>>>>> Any other ideas?
>>>>>>>>>=20
>>>>>>>>> Thanks in advance,
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>> Brandon
>>>>>>>>>=20
>>>>>>>>> --
>>>>>>>>> MySQL General Mailing List
>>>>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>>>>> To unsubscribe: http://lists.mysql.com/mysql?**** ..mysql.com/mysql?**>
>>>>>>>>> unsub=3Deroomydna@gmail.com >>>>>>>>> unsub=3Deroomydna@gmail.com eroomydna@gmail.com>
>>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>=20
>>>>>>> --
>>>>>>> MySQL General Mailing List
>>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>>>>> unsub=3Deroomydna@gmail.com oomydna@gmail.com>
>>>>>>>=20
>>>>>>>=20
>>>>>>=20
>>>>>=20
>>>>> --
>>>>> MySQL General Mailing List
>>>>> For list archives: http://lists.mysql.com/mysql
>>>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dderek@orange-=
pants.com
>>>>>=20
>>>>=20
>>>>=20
>>>=20
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmmanolov@liquid=
ation.com
>>>=20
>>=20
>>=20
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmmanolov@liquidat=
ion.com
>=20


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Query Optimization

am 08.09.2011 23:37:45 von Brandon Phelps

Ah I see. Well thanks for your assistance!

-Brandon

On 09/08/2011 05:21 PM, Mihail Manolov wrote:
> From the manual: "The default behavior for UNION is that duplicate rows are removed from the result."
>
> On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:
>
>> Mihail,
>>
>> Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that?
>>
>> (SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM sonicwall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
>> UNION
>> (SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM sonicwall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
>> ORDER BY rcvd DESC LIMIT 10;
>>
>> +----+--------------+------------+--------+----------------- ---------------------------+--------------+---------+------- -------------+------+----------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+--------------+------------+--------+----------------- ---------------------------+--------------+---------+------- -------------+------+----------------+
>> | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where |
>> | 1 | PRIMARY | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>> | 1 | PRIMARY | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>> | 2 | UNION | sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where |
>> | 2 | UNION | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>> | 2 | UNION | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>> | NULL | UNION RESULT | | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
>> +----+--------------+------------+--------+----------------- ---------------------------+--------------+---------+------- -------------+------+----------------+
>>
>>
>>
>> On 09/08/2011 03:45 PM, Mihail Manolov wrote:
>>> How about:
>>>
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM sonicwall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>> WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
>>>
>>> UNION
>>>
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM sonicwall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>> WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
>>>
>>>
>>>
>>> On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
>>>
>>>> Thanks for the idea Derek, however given the following query my EXPLAIN output is identical:
>>>>
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM sonicwall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE ('2011-09-07 13:18:58'<= open_dt<= '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58'<= close_dt<= '2011-09-08 13:18:58');
>>>>
>>>>
>>>> +----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>>>> +----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
>>>> | 1 | SIMPLE | sc | ALL | NULL | NULL | NULL | NULL | 32393330 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>>>> +----+-------------+-------+--------+---------------+------- --+---------+--------------------+----------+-------------+
>>>>
>>>> I did create indexes on open_dt and close_dt (2 separate indexes).
>>>>
>>>>
>>>>
>>>> On 09/08/2011 02:55 PM, Derek Downey wrote:
>>>>> Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times.
>>>>>
>>>>> If that's correct, you might be able to get an index_merge by doing a query like:
>>>>>
>>>>> WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting time)<=close_dt<=(ending time))
>>>>>
>>>>> and creating two indexes (one on 'open_dt' and the other on 'close_dt')
>>>>>
>>>>> http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimizat ion.html
>>>>>
>>>>> Regards,
>>>>> Derek
>>>>>
>>>>> On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
>>>>>
>>>>>> Andy,
>>>>>>
>>>>>> The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is:
>>>>>>
>>>>>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>>>>>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>>>>> | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>>>>>> +----+-------------+-------+--------+---------------------+- --------------------+---------+--------------------+-------- --+-----------------------------+
>>>>>>
>>>>>> When I remove the LIMIT 10 I get:
>>>>>>
>>>>>> ----+-------------+-------+--------+---------------------+-- -------+---------+--------------------+----------+---------- -------------------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>>>>>> +----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
>>>>>> | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | NULL | NULL | 32393330 | Using where; Using filesort |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
>>>>>> +----+-------------+-------+--------+---------------------+- --------+---------+--------------------+----------+--------- --------------------+
>>>>>>
>>>>>> Thanks for all your help thus far.
>>>>>>
>>>>>> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>>>>>>> I don't think I saw any query timings in the emails (maybe I missed them).
>>>>>>>
>>>>>>> What version of MySQL are you currently using?
>>>>>>> What does the explain look like when your remove the limit 10?
>>>>>>> Is your server tuned for MyISAM or InnoDB?
>>>>>>> What kind of disk setup is in use?
>>>>>>> How much memory is in your machine?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps wrote:
>>>>>>>
>>>>>>>> Thanks for the reply Andy. Unfortunately the users will be selecting
>>>>>>>> varying date ranges and new data is constantly coming in, so I am not sure
>>>>>>>> how I could archive/cache the necessary data that would be any more
>>>>>>>> efficient than simply using the database directly.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>>>>>
>>>>>>>>> Thinking outside the query, is there any archiving that could happen to
>>>>>>>>> make
>>>>>>>>> your large tables kinder in the range scan?
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps wrote:
>>>>>>>>>
>>>>>>>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>>>>>
>>>>>>>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> WHERE
>>>>>>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 00:00:00')
>>>>>>>>>>>>>
>>>>>>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 12:36:53')
>>>>>>>>>>>>>
>>>>>>>>>>>>> In that case your logic here simplifies to:
>>>>>>>>>>>>>> WHERE
>>>>>>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>>>>>>> AND
>>>>>>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jochem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jochem,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>>>>>>>> example, a very large file download might span multiple days so given
>>>>>>>>>>>>> your logic if the connection was started 2 days ago and I want to pull
>>>>>>>>>>>>> 1
>>>>>>>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>>>>>>>> to
>>>>>>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>>>>>>>> and
>>>>>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any other ideas?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe Jochem was on the right track but he got his dates
>>>>>>>>>>>> reversed.
>>>>>>>>>>>>
>>>>>>>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>>>>>>>> query
>>>>>>>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>>>>>>>> markers
>>>>>>>>>>>> showing their durations.
>>>>>>>>>>>>
>>>>>>>>>>>> a) (s) (e)
>>>>>>>>>>>> b) |---|
>>>>>>>>>>>> c) |---|
>>>>>>>>>>>> d) |---|
>>>>>>>>>>>> e) |--------------------|
>>>>>>>>>>>> f) |---|
>>>>>>>>>>>> g) |---|
>>>>>>>>>>>>
>>>>>>>>>>>> To describe these situations:
>>>>>>>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>>>>>>>> and
>>>>>>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>>>>>>>> part
>>>>>>>>>>>> of your results.
>>>>>>>>>>>> c) the event starts before the window but ends within the window -
>>>>>>>>>>>> include this
>>>>>>>>>>>> d) the event starts and ends within the window - include this
>>>>>>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>>>>>>> include
>>>>>>>>>>>> this
>>>>>>>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>>>>>>>> include this.
>>>>>>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>>>>>>>
>>>>>>>>>>>> In order to get every event in the range of c-f, here is what you need
>>>>>>>>>>>> for a WHERE clause
>>>>>>>>>>>>
>>>>>>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>>>>>>>
>>>>>>>>>>>> Try that and let us know the results.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>>>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>>>>>>>> you)
>>>>>>>>>>>
>>>>>>>>>>> Old method:
>>>>>>>>>>> SELECT
>>>>>>>>>>> sc.open_dt,
>>>>>>>>>>> sc.close_dt,
>>>>>>>>>>> sc.protocol,
>>>>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>>>>> sc.src_port,
>>>>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>>>>> sc.dst_port,
>>>>>>>>>>> sc.sent,
>>>>>>>>>>> sc.rcvd,
>>>>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>>>>> FROM firewall_connections AS sc
>>>>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>>>>>> WHERE
>>>>>>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>>>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>>>>>>>> 09:53:31')
>>>>>>>>>>> ORDER BY rcvd DESC
>>>>>>>>>>> LIMIT 0, 10;
>>>>>>>>>>>
>>>>>>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>>>>>>> ndx_open_close_dt):
>>>>>>>>>>> SELECT
>>>>>>>>>>> sc.open_dt,
>>>>>>>>>>> sc.close_dt,
>>>>>>>>>>> sc.protocol,
>>>>>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>>>>>> sc.src_port,
>>>>>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>>>>>> sc.dst_port,
>>>>>>>>>>> sc.sent,
>>>>>>>>>>> sc.rcvd,
>>>>>>>>>>> spm.desc AS src_port_desc,
>>>>>>>>>>> dpm.desc AS dst_port_desc
>>>>>>>>>>> FROM firewall_connections AS sc
>>>>>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>>>>>> WHERE
>>>>>>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>>>>>>>> ORDER BY rcvd DESC
>>>>>>>>>>> LIMIT 0, 10;
>>>>>>>>>>>
>>>>>>>>>>> EXPLAIN output for old method:
>>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>>> --**-------+
>>>>>>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>>>>>>> |
>>>>>>>>>>> rows | Extra |
>>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>>> --**-------+
>>>>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>>>>>> NULL | 10 | Using where |
>>>>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>>>>>>> |
>>>>>>>>>>> 1 | |
>>>>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>>>>>>> |
>>>>>>>>>>> 1 | |
>>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>>> --**-------+
>>>>>>>>>>>
>>>>>>>>>>> EXPLAIN output for new method with new index:
>>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>>> --**-------+
>>>>>>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>>>>>>> |
>>>>>>>>>>> rows | Extra |
>>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>>> --**-------+
>>>>>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>>>>>>> NULL | 10 | Using where |
>>>>>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>>>>>>> |
>>>>>>>>>>> 1 | |
>>>>>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>>>>>>> |
>>>>>>>>>>> 1 | |
>>>>>>>>>>> +----+-------------+-------+--****------+------------------- --**--**
>>>>>>>>>>> ----+----------+---------+----****----------------+------+-- --**
>>>>>>>>>>> --**-------+
>>>>>>>>>>>
>>>>>>>>>>> SHOW INDEX:
>>>>>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
>>>>>>>>>>> |
>>>>>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>>>>>>> 1342691
>>>>>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>>>>>> +----------------------+------****------+------------------- +-**--**
>>>>>>>>>>> -----------+-------------+----****-------+-------------+---- --**--**
>>>>>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Although right now the queries do seem to be executing much faster,
>>>>>>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>>>>>
>>>>>>>>>>> -Brandon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I am still having a big issue with my query as seen above. The table
>>>>>>>>>> is up
>>>>>>>>>> to around 32 million records at the moment and either of the two SELECT
>>>>>>>>>> queries above take a very long time to run. Is there anything at all I
>>>>>>>>>> can
>>>>>>>>>> do to speed things up? It seems that changing the format of the WHERE
>>>>>>>>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>>>>>>>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd)
>>>>>>>>>> but
>>>>>>>>>> that index does not get used.
>>>>>>>>>>
>>>>>>>>>> Any other ideas?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Brandon
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> MySQL General Mailing List
>>>>>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>>>>>> To unsubscribe: http://lists.mysql.com/mysql?****
>>>>>>>>>> unsub=eroomydna@gmail.com >>>>>>>>>> unsub=eroomydna@gmail.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> MySQL General Mailing List
>>>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>>>>>>> unsub=eroomydna@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> MySQL General Mailing List
>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=derek@orange-pants.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=mmanolov@liquidation.com
>>>>
>>>
>>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=mmanolov@liquidation.com
>>
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org