i suck, but would like to do something complicated with 4.1

am 15.01.2006 19:56:14 von Matthew Crouch

i suck so much that i don't even know if this is a JOIN or a subquery or
who-knows what. Here's the idea:

I want to select two things at the same time (form one table)
average for columnX
and
average for columnX where columnY=Z

so i started of course with
select avg(columnX) as avg1, avg(columnX) as avg2 from table where columnY=Z

which of course gives me the same thing twice.
i want
select avg(columnX) as avg1, avg(columnX) as avg2 from table where columnY=Z

so i head in this direction:
select avg(m1.columnX) as avg1, avg(m2.columnX) as avg2 from table where
m1.columnX<>'' and m2.columnY=Z

does this sound right? I just ran it and i got two different numbers, but
-it took forever (it's a big table)
and
-i haven't checked the math by hand yet.

Any tips for speed?

some clarity

am 15.01.2006 20:14:02 von Matthew Crouch

what I actually went to was

select avg(m1.columnX) as avg1, avg(m2.columnX) as avg2 from table m1,
table m2 where
m1.columnX<>'' and m2.columnY=Z

and i checked it and it looks right... i need to maximize speed though,
because i'll actually be calculating averages on several fields in this way.

How much would it slow down the inserts to build an index on EVERY field
in the table?

Re: i suck, but would like to do something complicated with 4.1

am 15.01.2006 20:58:11 von Bill Karwin

"Matthew Crouch" wrote in message
news:iVwyf.9523$Di.3957@trnddc06...
> I want to select two things at the same time (form one table)
> average for columnX
> and
> average for columnX where columnY=Z

Here's how I'd do it:

SELECT AVG(m1.columnX) AS avg1, AVG(m2.columnX) AS avg2
FROM myTable AS m1 LEFT OUTER JOIN myTable AS m2
ON m1.primaryKey = m2.primaryKey AND m2.columnY = Z

This works because AVG() ignores rows where the field has a NULL state, and
the columnY condition in the ON clause makes the join leave out some rows of
the right-hand-side of the join, replacing them with NULL fields.

Note that if you have more computations to make, you'd make an additional
left outer join for each one. MySQL has a practical limit to the number of
joins you can do in a single query, usually 31.

> Any tips for speed?

Do the separate computations in separate SQL queries. I don't know why so
many people on newsgroups insist on doing all their computations in a single
SQL query. It makes one's code a lot more complicated. Whomever takes over
the project after you're gone will curse your name.

So this is _really_ how I'd do it:

SELECT AVG(m1.columnX) AS avg1
FROM myTable

SELECT AVG(m2.columnX) AS avg2
FROM myTable AS m2
WHERE m2.columnY = Z

See how much simpler? And its easier to add new computations.

> How much would it slow down the inserts to build an index on EVERY field
in the table?

That might not matter. If you need the query to be faster than it can be
without those fields indexed, then you've got to pay the cost of maintaining
the indexes (if you need both inserts and computations of averages to be at
top speed, then consider pre-calculating the averages and storing them in
another table).

You should also know that MySQL can use only one index per table in any
given query. So if it's using the index on the primary key field (e.g. for
the join), then you might as well not index columnX or columnY.

Try the queries with and without the indexes created, and use the EXPLAIN
statement to help you understand if the indexes will help or not.
http://dev.mysql.com/doc/refman/5.0/en/explain.html

Regards,
Bill K.

Thanks ... um, I think it"s working TOO GOOD?

am 15.01.2006 22:28:25 von Matthew Crouch

Bill Karwin wrote:
> "Matthew Crouch" wrote in message
> news:iVwyf.9523$Di.3957@trnddc06...
>
>>I want to select two things at the same time (form one table)
>>average for columnX
>>and
>>average for columnX where columnY=Z
>
>
> Here's how I'd do it:
>
> SELECT AVG(m1.columnX) AS avg1, AVG(m2.columnX) AS avg2
> FROM myTable AS m1 LEFT OUTER JOIN myTable AS m2
> ON m1.primaryKey = m2.primaryKey AND m2.columnY = Z
>

Well, I was trying this out on one field, but i actually need
averages/sums for almost all of the fields in the table, so i took your
advice

this calculated 2 different averages on 8 different fields and came back
in less than a second -- 20,000 records.

This table is only likely to grow to 50,000 records over the life of the
project -- so if this checks out, i won't need any weird
optimizing/indexes at all. This is the largest table I've personally
worked with, but MySQL doesn't seem to be batting an eye...

> Note that if you have more computations to make, you'd make an
additional
> left outer join for each one.

i didn't do this 'cause i wasn't sure what you meant, but it worked
anyway. my query was like

SELECT
avg(m1.scale) as zscale, avg(m2.scale) as pscale,
avg(m1.bf_percent) as zbf_percent, avg(m2.bf_percent) as pbf_percent,
avg(m1.scc) as zscc, avg(m2.scc) as pscc,
avg(m1.spc_agency) as zspc_agency, avg(m2.spc_agency) as pspc_agency,
avg(m1.lpc_agency) as zlpc_agency, avg(m2.lpc_agency) as plpc_agency,
avg(m1.sediment_agency) as zsediment_agency, avg(m2.sediment_agency) as
psediment_agency,
avg(m1.temp) as ztemp, avg(m2.temp) as ptemp

FROM table AS m1
LEFT OUTER JOIN table AS m2
ON m1.pkey=m2.pkey
AND m2.producer_id='34567'

>>How much would it slow down the inserts to build an index on EVERY field
>
> in the table?
>
> That might not matter. If you need the query to be faster than it can be
> without those fields indexed, then you've got to pay the cost of maintaining
> the indexes (if you need both inserts and computations of averages to be at
> top speed, then consider pre-calculating the averages and storing them in
> another table).
>
> You should also know that MySQL can use only one index per table in any
> given query. So if it's using the index on the primary key field (e.g. for
> the join), then you might as well not index columnX or columnY.
>
> Try the queries with and without the indexes created, and use the EXPLAIN
> statement to help you understand if the indexes will help or not.
> http://dev.mysql.com/doc/refman/5.0/en/explain.html
>
> Regards,
> Bill K.
>
>

Re: Thanks ... um, I think it"s working TOO GOOD?

am 16.01.2006 03:37:12 von Bill Karwin

"Matthew Crouch" wrote in message
news:Z7zyf.8565$US3.4817@trnddc04...
> > Note that if you have more computations to make, you'd make an
> > additional left outer join for each one.
>
> i didn't do this 'cause i wasn't sure what you meant, but it worked
> anyway.

Ah -- never mind me, I misunderstood when you said you wanted to do this on
several fields. I thought you meant that you want to calculate averages on
columnX, based on different subsets of rows. So different averages for
columnY = Z1, for columnY = Z2, for columnZ >= 1234, etc. But that's not
what you meant. :-)

Regards
Bill K.