how to efficiently query for the next in MySQL Community Edition 5.1.34?

am 20.06.2009 08:31:35 von Mike Spreitzer

--=_alternative 0023D5EA852575DB_=
Content-Type: text/plain; charset="US-ASCII"

Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers. No row appears twice. A given
string appears many times, on average about 100 times. Suppose I have
millions of rows. I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string). I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I < t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
and t12.I < t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for
S=t1.S, and inside that do a further nested enumeration of t12's entries
for S=t1.S --- costing about 10,000 times the size of T. There has to be
a better way!

Thanks,
Mike Spreitzer

--=_alternative 0023D5EA852575DB_=--

Re: how to efficiently query for the next in MySQL Community Edition5.1.34?

am 20.06.2009 14:56:23 von Peter Brawley

--------------060606090304000508060408
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Mike

>J holding the next integer that T has for S

You mean for each i, the next value of i with that s?

>(U having no row for the last integer of each string).

I do not understand that at all.

PB

Mike Spreitzer wrote:
> Suppose I have a table T with two column, S holding strings (say,
> VARCHAR(200)) and I holding integers. No row appears twice. A given
> string appears many times, on average about 100 times. Suppose I have
> millions of rows. I want to make a table U holding those same columns
> plus one more, J holding the next integer that T has for S (U having no
> row for the last integer of each string). I could index T on (S,I) and
> write this query as
>
> select t1.*, t2.I as J from T as t1, T as t2
> where t1.S=t2.S and t1.I < t2.I
> and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
> and t12.I < t2.I)
>
> but the query planner says this is quite expensive to run: it will
> enumerate all of T as t1, do a nested enumeration of all t2's entries for
> S=t1.S, and inside that do a further nested enumeration of t12's entries
> for S=t1.S --- costing about 10,000 times the size of T. There has to be
> a better way!
>
> Thanks,
> Mike Spreitzer
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 06:53:00
>
>

--------------060606090304000508060408--

Re: how to efficiently query for the next in MySQL Community Edition

am 20.06.2009 15:08:31 von Johnny Withers

Huh???

On Saturday, June 20, 2009, Peter Brawley wro=
te:
> Mike
>
>>J holding the next integer that T has for S
>
> You mean for each i, the next value of i with that s?
>
>>(U having no row for the last integer of each string).
>
> I do not understand that at all.
>
> PB
>
>
> Mike Spreitzer wrote:
>
> Suppose I have a table T with two column, S holding strings (say, VARCHAR=
(200)) and I holding integers. =A0No row appears twice. =A0A given string a=
ppears many times, on average about 100 times. =A0Suppose I have millions o=
f rows. =A0I want to make a table U holding those same columns plus one mor=
e, J holding the next integer that T has for S (U having no row for the las=
t integer of each string). =A0I could index T on (S,I) and write this query=
as
>
> select t1.*, t2.I as J from T as t1, T as t2
> where t1.S=3Dt2.S and t1.I < t2.I
> and not exists (select * from T as t12 where t12.S=3Dt1.S and t1.I < t12.=
I and t12.I < t2.I)
>
> but the query planner says this is quite expensive to run: it will enumer=
ate all of T as t1, do a nested enumeration of all t2's entries for S=3Dt1.=
S, and inside that do a further nested enumeration of t12's entries for S=
=3Dt1.S --- costing about 10,000 times the size of T. =A0There has to be a =
better way!
>
> Thanks,
> Mike Spreitzer
>
> =A0--------------------------------------------------------- ------------=
---
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com=A0 Version: 8.5.364 / Vi=
rus Database: 270.12.80/2187 - Release Date: 06/19/09 06:53:00
>
>
>
>

--=20
-----------------------------
Johnny Withers
601.209.4985
johnny@pixelated.net

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

am 20.06.2009 17:57:29 von Mike Spreitzer

--=_alternative 0057A57A852575DB_=
Content-Type: text/plain; charset="US-ASCII"

Yes, for each (S, I) pair the goal is to efficiently find the next largest
integer associated with S in T. For the highest integer I associated with
S in T, there is no next larger.

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 08:56 AM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike

>J holding the next integer that T has for S

You mean for each i, the next value of i with that s?

>(U having no row for the last integer of each string).

I do not understand that at all.

PB

Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers. No row appears twice. A given
string appears many times, on average about 100 times. Suppose I have
millions of rows. I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string). I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I < t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
and t12.I < t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for
S=t1.S, and inside that do a further nested enumeration of t12's entries
for S=t1.S --- costing about 10,000 times the size of T. There has to be
a better way!

Thanks,
Mike Spreitzer

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09
06:53:00

--=_alternative 0057A57A852575DB_=--

Re: how to efficiently query for the next in MySQL Community Edition5.1.34?

am 20.06.2009 18:39:27 von Peter Brawley

--------------020502080507090702000005
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Mike,

>Yes, for each (S, I) pair the goal is to efficiently find the next largest
>integer associated with S in T. For the highest integer I associated with
>S in T, there is no next larger.

Here's a more efficient query for the next i values with matching s values:

SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i
) AS n JOIN t USING (i);

PB

-----

Mike Spreitzer wrote:
> Yes, for each (S, I) pair the goal is to efficiently find the next largest
> integer associated with S in T. For the highest integer I associated with
> S in T, there is no next larger.
>
> Thanks,
> Mike Spreitzer
>
>
>
>
> Peter Brawley
> 06/20/09 08:56 AM
> Please respond to
> peter.brawley@earthlink.net
>
>
> To
> Mike Spreitzer/Watson/IBM@IBMUS
> cc
> mysql@lists.mysql.com
> Subject
> Re: how to efficiently query for the next in MySQL Community Edition
> 5.1.34?
>
>
>
>
>
>
> Mike
>
>
>> J holding the next integer that T has for S
>>
>
> You mean for each i, the next value of i with that s?
>
>
>> (U having no row for the last integer of each string).
>>
>
> I do not understand that at all.
>
> PB
>
>
> Mike Spreitzer wrote:
> Suppose I have a table T with two column, S holding strings (say,
> VARCHAR(200)) and I holding integers. No row appears twice. A given
> string appears many times, on average about 100 times. Suppose I have
> millions of rows. I want to make a table U holding those same columns
> plus one more, J holding the next integer that T has for S (U having no
> row for the last integer of each string). I could index T on (S,I) and
> write this query as
>
> select t1.*, t2.I as J from T as t1, T as t2
> where t1.S=t2.S and t1.I < t2.I
> and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
> and t12.I < t2.I)
>
> but the query planner says this is quite expensive to run: it will
> enumerate all of T as t1, do a nested enumeration of all t2's entries for
> S=t1.S, and inside that do a further nested enumeration of t12's entries
> for S=t1.S --- costing about 10,000 times the size of T. There has to be
> a better way!
>
> Thanks,
> Mike Spreitzer
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09
> 06:53:00
>
>
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 06:15:00
>
>

--------------020502080507090702000005--

Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

am 20.06.2009 21:02:52 von Mike Spreitzer

--=_alternative 00689DBD852575DB_=
Content-Type: text/plain; charset="US-ASCII"

Ah, yes, the MIN should be very helpful. Can I expect that ordering the
storage by (S, I) or having an (S, I) index will make that MIN take O(1)
time, for both MyISAM and InnoDB?

I do not follow why you suggested a join to get the associated S, that can
be done in the original query (and I did NOT say a given integer I is
associated with only one string S):

SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 12:39 PM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike,
>Yes, for each (S, I) pair the goal is to efficiently find the next
largest
>integer associated with S in T. For the highest integer I associated
with
>S in T, there is no next larger.
Here's a more efficient query for the next i values with matching s
values:

SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i
) AS n JOIN t USING (i);

PB

-----

Mike Spreitzer wrote:
Yes, for each (S, I) pair the goal is to efficiently find the next largest

integer associated with S in T. For the highest integer I associated with

S in T, there is no next larger.

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 08:56 AM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike

J holding the next integer that T has for S

You mean for each i, the next value of i with that s?

(U having no row for the last integer of each string).

I do not understand that at all.

PB

Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers. No row appears twice. A given
string appears many times, on average about 100 times. Suppose I have
millions of rows. I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string). I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I < t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
and t12.I < t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for
S=t1.S, and inside that do a further nested enumeration of t12's entries
for S=t1.S --- costing about 10,000 times the size of T. There has to be
a better way!

Thanks,
Mike Spreitzer

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09

06:53:00

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09
06:15:00

--=_alternative 00689DBD852575DB_=--

Re: how to efficiently query for the next in MySQL Community Edition5.1.34?

am 20.06.2009 21:59:17 von Peter Brawley

--------------080205040602020103080708
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

>I do not follow why you suggested a join to get the associated S,
>that can be done in the original query (and I did NOT say a given
>integer I is associated with only one string S):

A Group By query returns arbitrary values for a column which (i) does
not Group By, (ii) does not aggregate, and (iii) does not have a 1:1
relationship with the grouping expression.

PB

-----

Mike Spreitzer wrote:
>
> Ah, yes, the MIN should be very helpful. Can I expect that ordering
> the storage by (S, I) or having an (S, I) index will make that MIN
> take O(1) time, for both MyISAM and InnoDB?
>
> I do not follow why you suggested a join to get the associated S, that
> can be done in the original query (and I did NOT say a given integer I
> is associated with only one string S):
>
> SELECT a.s, a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.i
>
> Thanks,
> Mike Spreitzer
>
>
>
> *Peter Brawley *
>
> 06/20/09 12:39 PM
> Please respond to
> peter.brawley@earthlink.net
>
>
>
> To
> Mike Spreitzer/Watson/IBM@IBMUS
> cc
> mysql@lists.mysql.com
> Subject
> Re: how to efficiently query for the next in MySQL Community Edition
> 5.1.34?
>
>
>
>
>
>
>
>
>
> Mike,
> >Yes, for each (S, I) pair the goal is to efficiently find the next
> largest
> >integer associated with S in T. For the highest integer I associated
> with
> >S in T, there is no next larger.
> Here's a more efficient query for the next i values with matching s
> values:
>
> SELECT a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.i
>
> To fetch the matching s values, join the above to the original table:
>
> SELECT n.i, t.s, n.j
> FROM (
> SELECT a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.i
> ) AS n JOIN t USING (i);
>
> PB
>
> -----
>
> Mike Spreitzer wrote:
> Yes, for each (S, I) pair the goal is to efficiently find the next
> largest
> integer associated with S in T. For the highest integer I associated
> with
> S in T, there is no next larger.
>
> Thanks,
> Mike Spreitzer
>
>
>
>
> Peter Brawley __
>
> 06/20/09 08:56 AM
> Please respond to
> _peter.brawley@earthlink.net_
>
>
> To
> Mike Spreitzer/Watson/IBM@IBMUS
> cc
> _mysql@lists.mysql.com_
> Subject
> Re: how to efficiently query for the next in MySQL Community Edition
> 5.1.34?
>
>
>
>
>
>
> Mike
>
>
> J holding the next integer that T has for S
>
>
> You mean for each i, the next value of i with that s?
>
>
> (U having no row for the last integer of each string).
>
>
> I do not understand that at all.
>
> PB
>
>
> Mike Spreitzer wrote:
> Suppose I have a table T with two column, S holding strings (say,
> VARCHAR(200)) and I holding integers. No row appears twice. A given
> string appears many times, on average about 100 times. Suppose I have
> millions of rows. I want to make a table U holding those same columns
> plus one more, J holding the next integer that T has for S (U having no
> row for the last integer of each string). I could index T on (S,I) and
> write this query as
>
> select t1.*, t2.I as J from T as t1, T as t2
> where t1.S=t2.S and t1.I < t2.I
> and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
> and t12.I < t2.I)
>
> but the query planner says this is quite expensive to run: it will
> enumerate all of T as t1, do a nested enumeration of all t2's entries for
> S=t1.S, and inside that do a further nested enumeration of t12's entries
> for S=t1.S --- costing about 10,000 times the size of T. There has to be
> a better way!
>
> Thanks,
> Mike Spreitzer
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - _www.avg.com_
> Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date:
> 06/19/09
> 06:53:00
>
>
>
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - _www.avg.com_
> Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date:
> 06/20/09 06:15:00
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 06:15:00
>
>

--------------080205040602020103080708--

Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

am 21.06.2009 00:44:34 von Mike Spreitzer

--=_alternative 007CEA70852575DB_=
Content-Type: text/plain; charset="US-ASCII"

Oops, I did not read your original query closely enough. You actually
meant to group by S, not I, right? I can get S, I, and J with

SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.s

Right?

My integers are not unique; a given integer can be paired with several
strings.

Thanks,
Mike Spreitzer
SMTP: mspreitz@us.ibm.com, Lotus Notes: Mike Spreitzer/Watson/IBM
Office phone: +1-914-784-6424 (IBM T/L 863-)
AOL Instant Messaging: M1k3Sprtzr

Peter Brawley
06/20/09 03:59 PM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

>I do not follow why you suggested a join to get the associated S,
>that can be done in the original query (and I did NOT say a given
>integer I is associated with only one string S):

A Group By query returns arbitrary values for a column which (i) does not
Group By, (ii) does not aggregate, and (iii) does not have a 1:1
relationship with the grouping expression.

PB

-----

Mike Spreitzer wrote:

Ah, yes, the MIN should be very helpful. Can I expect that ordering the
storage by (S, I) or having an (S, I) index will make that MIN take O(1)
time, for both MyISAM and InnoDB?

I do not follow why you suggested a join to get the associated S, that can
be done in the original query (and I did NOT say a given integer I is
associated with only one string S):

SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 12:39 PM

Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike,
>Yes, for each (S, I) pair the goal is to efficiently find the next
largest
>integer associated with S in T. For the highest integer I associated
with
>S in T, there is no next larger.
Here's a more efficient query for the next i values with matching s
values:

SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i
) AS n JOIN t USING (i);

PB

-----

Mike Spreitzer wrote:
Yes, for each (S, I) pair the goal is to efficiently find the next largest

integer associated with S in T. For the highest integer I associated with

S in T, there is no next larger.

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 08:56 AM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike

J holding the next integer that T has for S

You mean for each i, the next value of i with that s?

(U having no row for the last integer of each string).

I do not understand that at all.

PB

Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers. No row appears twice. A given
string appears many times, on average about 100 times. Suppose I have
millions of rows. I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string). I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I < t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
and t12.I < t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for
S=t1.S, and inside that do a further nested enumeration of t12's entries
for S=t1.S --- costing about 10,000 times the size of T. There has to be
a better way!

Thanks,
Mike Spreitzer

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09

06:53:00

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09
06:15:00

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09
06:15:00

--=_alternative 007CEA70852575DB_=--

Re: how to efficiently query for the next in MySQL Community Edition5.1.34?

am 21.06.2009 00:56:10 von Peter Brawley

--------------070602020405030505030801
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Mike,

>Oops, I did not read your original query closely enough.
>You actually meant to group by S, not I, right?

No, it's a query for next i values with matching s values, so it groups
by i.

>I can get S, I, and J with

>SELECT a.s, a.i, MIN(b.i) AS j
>FROM t AS a
>JOIN t AS b ON b.i > a.i AND a.s = b.s
>GROUP BY a.s

For this dataset ...

drop table if exists t;
create table t(i int,s char(1));
insert into t
values(1,'a'),(2,'b'),(3,'c'),(4,'a'),(5,'a'),(6,'d'),(7,'e' ),(8,'d');

are these the correct next values of i?

+------+------+
| i | j |
+------+------+
| 1 | 4 |
| 4 | 5 |
| 6 | 8 |
+------+------+

Your query doesn't return that.

PB

-----

Mike Spreitzer wrote:
>
> Oops, I did not read your original query closely enough. You actually
> meant to group by S, not I, right? I can get S, I, and J with
>
>
> SELECT a.s, a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.s
>
> Right?
>
> My integers are not unique; a given integer can be paired with several
> strings.
>
> Thanks,
> Mike Spreitzer
> SMTP: mspreitz@us.ibm.com, Lotus Notes: Mike Spreitzer/Watson/IBM
> Office phone: +1-914-784-6424 (IBM T/L 863-)
> AOL Instant Messaging: M1k3Sprtzr
>
>
> *Peter Brawley *
>
> 06/20/09 03:59 PM
> Please respond to
> peter.brawley@earthlink.net
>
>
>
> To
> Mike Spreitzer/Watson/IBM@IBMUS
> cc
> mysql@lists.mysql.com
> Subject
> Re: how to efficiently query for the next in MySQL Community Edition
> 5.1.34?
>
>
>
>
>
>
>
>
>
> >I do not follow why you suggested a join to get the associated S,
> >that can be done in the original query (and I did NOT say a given
> >integer I is associated with only one string S):
>
> A Group By query returns arbitrary values for a column which (i) does
> not Group By, (ii) does not aggregate, and (iii) does not have a 1:1
> relationship with the grouping expression.
>
> PB
>
> -----
>
> Mike Spreitzer wrote:
>
> Ah, yes, the MIN should be very helpful. Can I expect that ordering
> the storage by (S, I) or having an (S, I) index will make that MIN
> take O(1) time, for both MyISAM and InnoDB?
>
> I do not follow why you suggested a join to get the associated S, that
> can be done in the original query (and I did NOT say a given integer I
> is associated with only one string S):
>
> SELECT a.s, a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.i
>
> Thanks,
> Mike Spreitzer
>
>
> *Peter Brawley **__*
>
>
> 06/20/09 12:39 PM
> Please respond to_
> __peter.brawley@earthlink.net_
>
>
> To
> Mike Spreitzer/Watson/IBM@IBMUS
> cc
> _mysql@lists.mysql.com_
> Subject
> Re: how to efficiently query for the next in MySQL Community Edition
> 5.1.34?
>
>
>
>
>
>
>
>
>
>
>
> Mike,
> >Yes, for each (S, I) pair the goal is to efficiently find the next
> largest
> >integer associated with S in T. For the highest integer I associated
> with
> >S in T, there is no next larger.
> Here's a more efficient query for the next i values with matching s
> values:
>
> SELECT a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.i
>
> To fetch the matching s values, join the above to the original table:
>
> SELECT n.i, t.s, n.j
> FROM (
> SELECT a.i, MIN(b.i) AS j
> FROM t AS a
> JOIN t AS b ON b.i > a.i AND a.s = b.s
> GROUP BY a.i
> ) AS n JOIN t USING (i);
>
> PB
>
> -----
>
> Mike Spreitzer wrote:
> Yes, for each (S, I) pair the goal is to efficiently find the next
> largest
> integer associated with S in T. For the highest integer I associated
> with
> S in T, there is no next larger.
>
> Thanks,
> Mike Spreitzer
>
>
>
>
> Peter Brawley __
>
> 06/20/09 08:56 AM
> Please respond to_
> __peter.brawley@earthlink.net_
>
>
> To
> Mike Spreitzer/Watson/IBM@IBMUS
> cc_
> __mysql@lists.mysql.com_
> Subject
> Re: how to efficiently query for the next in MySQL Community Edition
> 5.1.34?
>
>
>
>
>
>
> Mike
>
>
> J holding the next integer that T has for S
>
>
> You mean for each i, the next value of i with that s?
>
>
> (U having no row for the last integer of each string).
>
>
> I do not understand that at all.
>
> PB
>
>
> Mike Spreitzer wrote:
> Suppose I have a table T with two column, S holding strings (say,
> VARCHAR(200)) and I holding integers. No row appears twice. A given
> string appears many times, on average about 100 times. Suppose I have
> millions of rows. I want to make a table U holding those same columns
> plus one more, J holding the next integer that T has for S (U having no
> row for the last integer of each string). I could index T on (S,I) and
> write this query as
>
> select t1.*, t2.I as J from T as t1, T as t2
> where t1.S=t2.S and t1.I < t2.I
> and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
> and t12.I < t2.I)
>
> but the query planner says this is quite expensive to run: it will
> enumerate all of T as t1, do a nested enumeration of all t2's entries for
> S=t1.S, and inside that do a further nested enumeration of t12's entries
> for S=t1.S --- costing about 10,000 times the size of T. There has to be
> a better way!
>
> Thanks,
> Mike Spreitzer
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - _www.avg.com_
> Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date:
> 06/19/09
> 06:53:00
>
>
>
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - _www.avg.com_
> Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date:
> 06/20/09 06:15:00
>
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - _www.avg.com_
> Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date:
> 06/20/09 06:15:00
>
>
> ------------------------------------------------------------ ------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09 06:15:00
>
>

--------------070602020405030505030801--

Re: how to efficiently query for the next in MySQL Community Edition 5.1.34?

am 21.06.2009 05:49:08 von Mike Spreitzer

--=_alternative 00147EEB852575DC_=
Content-Type: text/plain; charset="US-ASCII"

Sorry I have not been careful enough. Following is a very concrete,
worked example -- so I think I have finally gotten the bugs out. After
the example I resume with unanswered questions.

Remember I did not say each integer appears only once, and consider this
dataset:

create table t (s char(1), i int);
insert into t values ('a', 1), ('b', 1), ('b', 3), ('c', 3), ('b', 4),
('a', 5), ('c', 5);
mysql> select * from t;
+------+------+
| s | i |
+------+------+
| a | 1 |
| b | 1 |
| b | 3 |
| c | 3 |
| b | 4 |
| a | 5 |
| c | 5 |
+------+------+
7 rows in set (0.00 sec)

Here is the ineffecient way to compute the desired answer:

mysql> select t1.*, t2.i as j from t as t1, t as t2 where t1.s=t2.s and
t1.i < t2.i and not exists (select * from t as t12 where t12.s=t1.s and
t1.i < t12.i and t12.i < t2.i);
+------+------+------+
| s | i | j |
+------+------+------+
| b | 1 | 3 |
| b | 3 | 4 |
| a | 1 | 5 |
| c | 3 | 5 |
+------+------+------+
4 rows in set (0.01 sec)

Here is the better way, using min() (the order of the rows is unimportant
here):

mysql> select t1.*, min(t2.i) as j from t as t1, t as t2 where t1.s=t2.s
and t1.i +------+------+------+
| s | i | j |
+------+------+------+
| a | 1 | 5 |
| b | 1 | 3 |
| b | 3 | 4 |
| c | 3 | 5 |
+------+------+------+
4 rows in set (0.00 sec)

Code that assumes uniqueness of the integers does not work:

mysql> SELECT a.i, MIN(b.i) AS j
-> FROM t AS a
-> JOIN t AS b ON b.i > a.i AND a.s = b.s
-> GROUP BY a.i;
+------+------+
| i | j |
+------+------+
| 1 | 3 |
| 3 | 4 |
+------+------+
2 rows in set (0.00 sec)

What remains unclear to me is how fast the correct min-based query (the
"better way" above) will run. You and I know that we could walk a BTREE
on (s, i) and compute the answer in linear time. As I read the MySQL
documentation, however, this query does not fit the constraints for fast
range-based indexing into t2 because the "t1.i < t2.i" comparison does not
compare t2's value with something that meets the criteria for "a
constant". It looks like the query planner plans to do a nested
enumeration of the integers associated with s, for each (s, i) row in the
outer enumeration:

mysql> alter table t add primary key (s, i);
Query OK, 7 rows affected (0.01 sec)
Records: 7 Duplicates: 0 Warnings: 0

mysql> explain select t1.*, min(t2.i) as j from t as t1, t as t2 where
t1.s=t2.s and t1.i +----+-------------+-------+-------+---------------+-------- -+---------+-----------------+------+----------------------- ---+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------- -+---------+-----------------+------+----------------------- ---+
| 1 | SIMPLE | t1 | index | PRIMARY | PRIMARY | 5 |
NULL | 7 | Using index |
| 1 | SIMPLE | t2 | ref | PRIMARY | PRIMARY | 1 |
mjs090605a.t1.s | 3 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------- -+---------+-----------------+------+----------------------- ---+

That is an improvement over my original formulation:

mysql> explain select t1.*, t2.i as j from t as t1, t as t2 where
t1.s=t2.s and t1.i < t2.i and not exists (select * from t as t12 where
t12.s=t1.s and t1.i < t12.i and t12.i < t2.i);
+----+--------------------+-------+-------+---------------+- --------+---------+-----------------+------+---------------- ----------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+- --------+---------+-----------------+------+---------------- ----------+
| 1 | PRIMARY | t1 | index | PRIMARY | PRIMARY | 5 |
NULL | 7 | Using index |
| 1 | PRIMARY | t2 | ref | PRIMARY | PRIMARY | 1 |
mjs090605a.t1.s | 3 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | t12 | ref | PRIMARY | PRIMARY | 1 |
mjs090605a.t1.s | 3 | Using where; Using index |
+----+--------------------+-------+-------+---------------+- --------+---------+-----------------+------+---------------- ----------+

We have reduced the time complexity from O( (size of T) * (avg num
integers per string)^2 ) to O( (size of T) * (avg num integers per
string)^1 ). That's great. We have saved about a factor of 100 in my
real application (a given string is paired with something on the order of
100 different integers). But we could save another factor of 100. How do
I save (even a portion of) that?

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 06:56 PM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike,

>Oops, I did not read your original query closely enough.
>You actually meant to group by S, not I, right?

No, it's a query for next i values with matching s values, so it groups by
i.

>I can get S, I, and J with

>SELECT a.s, a.i, MIN(b.i) AS j
>FROM t AS a
>JOIN t AS b ON b.i > a.i AND a.s = b.s
>GROUP BY a.s

For this dataset ...

drop table if exists t;
create table t(i int,s char(1));
insert into t
values(1,'a'),(2,'b'),(3,'c'),(4,'a'),(5,'a'),(6,'d'),(7,'e' ),(8,'d');

are these the correct next values of i?

+------+------+
| i | j |
+------+------+
| 1 | 4 |
| 4 | 5 |
| 6 | 8 |
+------+------+

Your query doesn't return that.

PB

-----

Mike Spreitzer wrote:

Oops, I did not read your original query closely enough. You actually
meant to group by S, not I, right? I can get S, I, and J with

SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.s

Right?

My integers are not unique; a given integer can be paired with several
strings.

Thanks,
Mike Spreitzer
SMTP: mspreitz@us.ibm.com, Lotus Notes: Mike Spreitzer/Watson/IBM
Office phone: +1-914-784-6424 (IBM T/L 863-)
AOL Instant Messaging: M1k3Sprtzr

Peter Brawley
06/20/09 03:59 PM

Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

>I do not follow why you suggested a join to get the associated S,
>that can be done in the original query (and I did NOT say a given
>integer I is associated with only one string S):

A Group By query returns arbitrary values for a column which (i) does not
Group By, (ii) does not aggregate, and (iii) does not have a 1:1
relationship with the grouping expression.

PB

-----

Mike Spreitzer wrote:

Ah, yes, the MIN should be very helpful. Can I expect that ordering the
storage by (S, I) or having an (S, I) index will make that MIN take O(1)
time, for both MyISAM and InnoDB?

I do not follow why you suggested a join to get the associated S, that can
be done in the original query (and I did NOT say a given integer I is
associated with only one string S):

SELECT a.s, a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 12:39 PM

Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike,
>Yes, for each (S, I) pair the goal is to efficiently find the next
largest
>integer associated with S in T. For the highest integer I associated
with
>S in T, there is no next larger.
Here's a more efficient query for the next i values with matching s
values:

SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i

To fetch the matching s values, join the above to the original table:

SELECT n.i, t.s, n.j
FROM (
SELECT a.i, MIN(b.i) AS j
FROM t AS a
JOIN t AS b ON b.i > a.i AND a.s = b.s
GROUP BY a.i
) AS n JOIN t USING (i);

PB

-----

Mike Spreitzer wrote:
Yes, for each (S, I) pair the goal is to efficiently find the next largest

integer associated with S in T. For the highest integer I associated with

S in T, there is no next larger.

Thanks,
Mike Spreitzer

Peter Brawley
06/20/09 08:56 AM
Please respond to
peter.brawley@earthlink.net

To
Mike Spreitzer/Watson/IBM@IBMUS
cc
mysql@lists.mysql.com
Subject
Re: how to efficiently query for the next in MySQL Community Edition
5.1.34?

Mike

J holding the next integer that T has for S

You mean for each i, the next value of i with that s?

(U having no row for the last integer of each string).

I do not understand that at all.

PB

Mike Spreitzer wrote:
Suppose I have a table T with two column, S holding strings (say,
VARCHAR(200)) and I holding integers. No row appears twice. A given
string appears many times, on average about 100 times. Suppose I have
millions of rows. I want to make a table U holding those same columns
plus one more, J holding the next integer that T has for S (U having no
row for the last integer of each string). I could index T on (S,I) and
write this query as

select t1.*, t2.I as J from T as t1, T as t2
where t1.S=t2.S and t1.I < t2.I
and not exists (select * from T as t12 where t12.S=t1.S and t1.I < t12.I
and t12.I < t2.I)

but the query planner says this is quite expensive to run: it will
enumerate all of T as t1, do a nested enumeration of all t2's entries for
S=t1.S, and inside that do a further nested enumeration of t12's entries
for S=t1.S --- costing about 10,000 times the size of T. There has to be
a better way!

Thanks,
Mike Spreitzer

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09

06:53:00

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09
06:15:00

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09
06:15:00

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.364 / Virus Database: 270.12.81/2189 - Release Date: 06/20/09
06:15:00

--=_alternative 00147EEB852575DC_=--

Anyone using LVM for backing up?

am 22.06.2009 22:41:26 von tlittle

We have a 20 gig db (that includes the MYIs and MYDs and FRMs).

We are wondering how long LVM snapshots take.. in that how long might
the DB be read-locked? Do we have to read-lock it and flush tables?

Are we talking half a second, ten-seconds, 20 minutes?

Currently, when we copy the raw files from one directory to another, it
takes about 20 mins and brings the DB to it's proverbial knees. When we
copy the files with the db server down, it takes 10 minutes or so.

Tim...=20

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Anyone using LVM for backing up?

am 22.06.2009 22:50:13 von mcgonagle

Hi Tim,
We use LVM snapshots all the time. They are essentially
instantaneous with our >90GB innodb database files.

A command to generate the snapshot could be:
sudo /usr/sbin/lvcreate --snapshot --name mysqlsqlbackup --size 15G /
dev/system/data01

Please let me know if you have any questions.
-Tom

On Jun 22, 2009, at 4:41 PM, Little, Timothy wrote:

> We have a 20 gig db (that includes the MYIs and MYDs and FRMs).
>
> We are wondering how long LVM snapshots take.. in that how long might
> the DB be read-locked? Do we have to read-lock it and flush tables?
>
> Are we talking half a second, ten-seconds, 20 minutes?
>
> Currently, when we copy the raw files from one directory to another,
> it
> takes about 20 mins and brings the DB to it's proverbial knees.
> When we
> copy the files with the db server down, it takes 10 minutes or so.
>
> Tim...
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=tmcgonagle@online-buddies .com
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Anyone using LVM for backing up?

am 23.06.2009 00:20:15 von Jim Lyons

--0016e64652904656da046cf74708
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

What we do to start is the following:

) open 2 windows to the server running the mysql instance
) in one window,
) run the mysql cli and issue command 'flush tables with read lock'
) stop the slave, if this is a running slave
) run either "show master status" or "show slave status", whichever is
appropriate, to get log position
) in the other window
) run the "sync" command several times
) create the snapshots
) back in the first window
) unlock that tables
) start the slave, if appropriate
) back in the other window
) mount the snapshots

We do it this way to minimize the time the server we're syncing from is in
read lock.

If anyone sees any flaws in this, please let me know. There's a lot more,
of course, involving rsync and "change master". I just dealt with the
beginning part.

On Mon, Jun 22, 2009 at 3:41 PM, Little, Timothy <
tlittle@thomaspublishing.com> wrote:

> We have a 20 gig db (that includes the MYIs and MYDs and FRMs).
>
> We are wondering how long LVM snapshots take.. in that how long might
> the DB be read-locked? Do we have to read-lock it and flush tables?
>
> Are we talking half a second, ten-seconds, 20 minutes?
>
> Currently, when we copy the raw files from one directory to another, it
> takes about 20 mins and brings the DB to it's proverbial knees. When we
> copy the files with the db server down, it takes 10 minutes or so.
>
> Tim...
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=jlyons4435@gmail.com
>
>

--
Jim Lyons
Web developer / Database administrator
http://www.weblyons.com

--0016e64652904656da046cf74708--

Re: Anyone using LVM for backing up?

am 23.06.2009 01:42:37 von David Sparks

Little, Timothy wrote:
> We have a 20 gig db (that includes the MYIs and MYDs and FRMs).
>
> We are wondering how long LVM snapshots take.. in that how long might
> the DB be read-locked? Do we have to read-lock it and flush tables?

Take a look at mylvmbackup which takes care of flushing tables, creating and
destroying the snapshot, etc:

http://www.lenzg.net/mylvmbackup/

Expect a serious performance hit while the lvm snapshot is active.

ds

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Anyone using LVM for backing up?

am 04.07.2009 23:08:27 von Baron Schwartz

Hi Tim,

On Mon, Jun 22, 2009 at 4:41 PM, Little,
Timothy wrote:
> We have a 20 gig db (that includes the MYIs and MYDs and FRMs).
>
> We are wondering how long LVM snapshots take.. in that how long might
> the DB be read-locked? =A0Do we have to read-lock it and flush tables?
>
> Are we talking half a second, ten-seconds, 20 minutes?

It depends. Long-running queries will block FLUSH TABLES WITH READ
LOCK, if you're using it, which will in turn block other queries. So
FLUSH TABLES WITH READ LOCK itself can take a long time. If you're
using only InnoDB tables you don't need to do that, you can just take
the snapshot and go. Upon recovery on the other server you can find
the binlog position in InnoDB's messages to the error log.

There are a lot of subtleties to all of this, so maybe you can give a
few more details about your setup.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg