UNION, LIMIT and ORDER BY

UNION, LIMIT and ORDER BY

am 25.05.2007 16:23:08 von Israel

I've trying to write a query that seems like it should be simple but
for some reason my attempts are not working. This is really a general
SQL quesion and doesn't pertain to MySQL but I couldn't find a generic
database discussion group except on on advancement and theory and this
is really a basic query construction question.

Just say I have a table with three columns, name, date, score and
these represent the test scores for various people. Each person can
take the test as many times as they like but I only want to get back
the last result.

My first thought didn't seem right but I figured I'd try it anyway:
SELECT name, date, score FROM testresults
GROUP BY name
ORDER BY date DESC
And yes this is wrong and does not return anything useful since the
score returned seems to be based on where the grouping started in
someway.

My second thought was that the list of people is small and known so I
was going to just hardcode their names into the query:
SELECT name, date, score FROM testresults WHERE name = 'bob' ORDER BY
date DESC LIMIT 1
UNION ALL
SELECT name, date, score FROM testresults WHERE name = 'mary' ORDER BY
date DESC LIMIT 1
UNION ALL
SELECT name, date, score FROM testresults WHERE name = 'jim' ORDER BY
date DESC LIMIT 1
UNION ALL
This is syntactically incorrect.

Can anyone help me with this query?

Re: UNION, LIMIT and ORDER BY

am 25.05.2007 17:03:31 von zac.carey

On May 25, 3:23 pm, Israel wrote:
> I've trying to write a query that seems like it should be simple but
> for some reason my attempts are not working. This is really a general
> SQL quesion and doesn't pertain to MySQL but I couldn't find a generic
> database discussion group except on on advancement and theory and this
> is really a basic query construction question.
>
> Just say I have a table with three columns, name, date, score and
> these represent the test scores for various people. Each person can
> take the test as many times as they like but I only want to get back
> the last result.
>
> My first thought didn't seem right but I figured I'd try it anyway:
> SELECT name, date, score FROM testresults
> GROUP BY name
> ORDER BY date DESC
> And yes this is wrong and does not return anything useful since the
> score returned seems to be based on where the grouping started in
> someway.
>
> My second thought was that the list of people is small and known so I
> was going to just hardcode their names into the query:
> SELECT name, date, score FROM testresults WHERE name = 'bob' ORDER BY
> date DESC LIMIT 1
> UNION ALL
> SELECT name, date, score FROM testresults WHERE name = 'mary' ORDER BY
> date DESC LIMIT 1
> UNION ALL
> SELECT name, date, score FROM testresults WHERE name = 'jim' ORDER BY
> date DESC LIMIT 1
> UNION ALL
> This is syntactically incorrect.
>
> Can anyone help me with this query?

SELECT t1.* FROM tests t1
LEFT JOIN tests t2 ON t1.name = t2.name
AND t1.date > t2.date
WHERE t2.date IS NULL;

Re: UNION, LIMIT and ORDER BY

am 28.05.2007 19:35:39 von Good Man

Israel wrote in news:1180102988.273271.53270
@p77g2000hsh.googlegroups.com:

> I've trying to write a query that seems like it should be simple but
> for some reason my attempts are not working. This is really a general
> SQL quesion and doesn't pertain to MySQL but I couldn't find a generic
> database discussion group except on on advancement and theory and this
> is really a basic query construction question.
>
> Just say I have a table with three columns, name, date, score and
> these represent the test scores for various people. Each person can
> take the test as many times as they like but I only want to get back
> the last result.
>
> My first thought didn't seem right but I figured I'd try it anyway:
> SELECT name, date, score FROM testresults
> GROUP BY name
> ORDER BY date DESC
> And yes this is wrong and does not return anything useful since the
> score returned seems to be based on where the grouping started in
> someway.

Ideally, you would have two tables here: one with people names (auto-
increment user id, user name) and one with test results (auto-increment
tableID, ID referencing the person, testdate, testresult). This would
make this query, and others you are likely to create afterwards,
incredibly easier (though this is untested):

SELECT t.date,t.score,u.name
FROM testresults t
JOIN users u ON t.userID=u.userID
GROUP BY t.userID
ORDER BY t.date DESC

The key to realizing there is a better way to design your database is
any replication of data: ie in your table, the name "Bob" is entered as
many times as he took the test. What if he called you up and said "Id
prefer my name be stored as Robert"? You'd have to go trawling through
this table, and any others, changing data when there is really no need
to do so - changing a single entry in the "users" table would do it.

If you don't want to change your table design, I would stick with your
query (minus the GROUP BY) and use a programming language (ie: PHP) to
display the results correctly.

Re: UNION, LIMIT and ORDER BY

am 30.05.2007 16:07:56 von Israel

On May 28, 1:35 pm, Good Man wrote:
> The key to realizing there is a better way to design your database is
> any replication of data: ie in your table, the name "Bob" is entered as
> many times as he took the test. What if he called you up and said "Id
> prefer my name be stored as Robert"? You'd have to go trawling through
> this table, and any others, changing data when there is really no need
> to do so - changing a single entry in the "users" table would do it.

My explanation of the tables was modified for brevity. There's
actually two tables (linked by FK) and the data is actually parameter
data for the state of some hardware components on a machine that get
logged at a regular interval but I wanted to create a query that would
just show me the latest for all of the components at arbitrary times.
This query would only list a handful of parameter types but there's on
the order of 100 parameter types all together and the parameter log
table keeps all of the history for post-analysis and can have on the
order of 100k-200k records so joining a table back onto itself is not
really option considering how slow it is.
Unless I'm missing something about the performance of joining in this
way I'll probably have to resort to multiple independent queries -
hence my original thought of using UNION ALL which I couldn't seem to
get the syntax correct for.
I was trying to avoid having to write any code because we already have
plans in the works for making a front end application that will allow
users to extract all sorts of data so I don't want to waste time
making a collection of hodge podge little apps or php code that
duplicates this work. Until we get the final application completed I
wanted to do everything via queries otherwise it will be like pulling
teeth to get users to stop using all of the "temporary" applications
but in the interim people can still get the data they need.

Re: UNION, LIMIT and ORDER BY

am 30.05.2007 16:35:22 von Israel

On May 28, 1:35 pm, Good Man wrote:
> SELECT t.date,t.score,u.name
> FROM testresults t
> JOIN users u ON t.userID=u.userID
> GROUP BY t.userID
> ORDER BY t.date DESC

With the ORDER BY after the GROUP BY I just end up with the final
groups ordered by the date from some (seemingly random) row from each
group. What I need is for the first row picked for the group to be
the latest entry for that group but you can't switch the ORDER BY and
GROUP BY clauses.

Re: UNION, LIMIT and ORDER BY

am 30.05.2007 18:18:13 von zac.carey

On May 30, 3:35 pm, Israel wrote:
> On May 28, 1:35 pm, Good Man wrote:
>
> > SELECT t.date,t.score,u.name
> > FROM testresults t
> > JOIN users u ON t.userID=u.userID
> > GROUP BY t.userID
> > ORDER BY t.date DESC
>
> With the ORDER BY after the GROUP BY I just end up with the final
> groups ordered by the date from some (seemingly random) row from each
> group. What I need is for the first row picked for the group to be
> the latest entry for that group but you can't switch the ORDER BY and
> GROUP BY clauses.

I can't see what's wrong with the query I posted previously - but if
you provide us with an example dataset and the result you'd expect to
obtain from that, perhaps we can help. Ideally include the CREATE and
INSERT statements too.

Re: UNION, LIMIT and ORDER BY

am 30.05.2007 21:22:32 von Israel

On May 30, 12:18 pm, strawberry wrote:
> I can't see what's wrong with the query I posted previously - but if
> you provide us with an example dataset and the result you'd expect to
> obtain from that, perhaps we can help. Ideally include the CREATE and
> INSERT statements too.

The issue I have is that it performs a join onto itself and with
100k-200k records it is way to slow.

Re: UNION, LIMIT and ORDER BY

am 31.05.2007 11:32:56 von zac.carey

On May 30, 8:22 pm, Israel wrote:
> On May 30, 12:18 pm, strawberry wrote:
>
> > I can't see what's wrong with the query I posted previously - but if
> > you provide us with an example dataset and the result you'd expect to
> > obtain from that, perhaps we can help. Ideally include the CREATE and
> > INSERT statements too.
>
> The issue I have is that it performs a join onto itself and with
> 100k-200k records it is way to slow.

Too slow? With appropriate indexes I'd imagine that it would take
about 5 seconds. Is that too slow? Is 'GROUP BY' quicker!?!.

Re: UNION, LIMIT and ORDER BY

am 31.05.2007 15:25:35 von Israel

On May 31, 5:32 am, strawberry wrote:
> Too slow? With appropriate indexes I'd imagine that it would take
> about 5 seconds. Is that too slow? Is 'GROUP BY' quicker!?!.

Ok, so I must be missing something then. This is my sql to create the
tables:
CREATE TABLE `parameter` (
`Id` int(10) unsigned NOT NULL auto_increment,
`Name` varchar(45) NOT NULL default '',
PRIMARY KEY (`Id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `parameterlog` (
`ParameterId` int(10) unsigned NOT NULL default '0',
`SampleTimestamp` double NOT NULL default '0',
`Data` varchar(50) NOT NULL default '',
KEY `FK_parameterdata_Parameter` (`ParameterId`),
CONSTRAINT `FK_parameterdata_Parameter` FOREIGN KEY (`ParameterId`)
REFERENCES `parameter` (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The FK_parameterdata_Parameter creates a BTREE index on the
ParameterId field so performing a join with that field with should as
quick as possible - right?

I ran the following query:
SELECT p1.* FROM `parameterlog` p1
LEFT JOIN `parameterlog` p2 ON p1.ParameterId = p2.ParameterId
AND p1.`Sampletimestamp` < p2.`Sampletimestamp`
WHERE p2.`Sampletimestamp` IS NULL;

On a smaller database with about 56,000 rows in this table it took 141
seconds and returned 73 rows. I didn't have the patience to wait for
the query for the larger databases but I can only assume that it has
an exponential growth.

I ran the following query (which doesn't produce the correct results):
SELECT * FROM parameterlog
GROUP BY ParameterId
ORDER BY SampleTimestamp
With the same record set of 56,000 rows this query executed in 0.21
seconds.

I'm using a 1.86GHz Pentium laptop with 1GB RAM and I'm running MySQL
server 4.1.

Re: UNION, LIMIT and ORDER BY

am 31.05.2007 17:04:47 von zac.carey

On May 31, 2:25 pm, Israel wrote:
> On May 31, 5:32 am, strawberry wrote:
>
> > Too slow? With appropriate indexes I'd imagine that it would take
> > about 5 seconds. Is that too slow? Is 'GROUP BY' quicker!?!.
>
> Ok, so I must be missing something then. This is my sql to create the
> tables:
> CREATE TABLE `parameter` (
> `Id` int(10) unsigned NOT NULL auto_increment,
> `Name` varchar(45) NOT NULL default '',
> PRIMARY KEY (`Id`),
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
> CREATE TABLE `parameterlog` (
> `ParameterId` int(10) unsigned NOT NULL default '0',
> `SampleTimestamp` double NOT NULL default '0',
> `Data` varchar(50) NOT NULL default '',
> KEY `FK_parameterdata_Parameter` (`ParameterId`),
> CONSTRAINT `FK_parameterdata_Parameter` FOREIGN KEY (`ParameterId`)
> REFERENCES `parameter` (`Id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
> The FK_parameterdata_Parameter creates a BTREE index on the
> ParameterId field so performing a join with that field with should as
> quick as possible - right?
>
> I ran the following query:
> SELECT p1.* FROM `parameterlog` p1
> LEFT JOIN `parameterlog` p2 ON p1.ParameterId = p2.ParameterId
> AND p1.`Sampletimestamp` < p2.`Sampletimestamp`
> WHERE p2.`Sampletimestamp` IS NULL;
>
> On a smaller database with about 56,000 rows in this table it took 141
> seconds and returned 73 rows. I didn't have the patience to wait for
> the query for the larger databases but I can only assume that it has
> an exponential growth.
>
> I ran the following query (which doesn't produce the correct results):
> SELECT * FROM parameterlog
> GROUP BY ParameterId
> ORDER BY SampleTimestamp
> With the same record set of 56,000 rows this query executed in 0.21
> seconds.
>
> I'm using a 1.86GHz Pentium laptop with 1GB RAM and I'm running MySQL
> server 4.1.

I'm not really familiar with this approach - but what happens if you
put an index on sampletimestamp?

Re: UNION, LIMIT and ORDER BY

am 31.05.2007 22:55:01 von Israel

On May 31, 11:04 am, strawberry wrote:
> I'm not really familiar with this approach - but what happens if you
> put an index on sampletimestamp?

Interestingly I tried that a few times and averaged 147 seconds and
then after I removed it the time averaged 154 seconds. The load on my
system is roughly the same as it was the first time I ran the test so
I have no clue why it changed but running the query does peg out my
CPU at 100% for a while.
Without more exhaustive testing I can't determine whether or not
putting an index on sampletimestamp had any appreciable effect.

Re: UNION, LIMIT and ORDER BY

am 01.06.2007 16:47:06 von Good Man

Israel wrote in news:1180617935.216117.284910
@q75g2000hsh.googlegroups.com:


> The FK_parameterdata_Parameter creates a BTREE index on the
> ParameterId field so performing a join with that field with should as
> quick as possible - right?

yeah, but according to your query, you're also performing a join on
'Sampletimestamp'... so those should be indexed as well

>
> I ran the following query:
> SELECT p1.* FROM `parameterlog` p1
> LEFT JOIN `parameterlog` p2 ON p1.ParameterId = p2.ParameterId
> AND p1.`Sampletimestamp` < p2.`Sampletimestamp`
> WHERE p2.`Sampletimestamp` IS NULL;


> SELECT * FROM parameterlog
> GROUP BY ParameterId
> ORDER BY SampleTimestamp
>
> With the same record set of 56,000 rows this query executed in 0.21
> seconds.

Of course it did. The query just says 'get everything' then group and order
by'. There's not much processing power/time in getting 'everything'.

Perhaps if you add an index on 'Sampletimestamp' things will happen much
quicker... have you looked at the EXPLAIN result of your query?