STD of multiple columns

STD of multiple columns

am 14.04.2006 09:34:59 von Stefaan Lhermitte

Dear Mysql-ians,

I want to calculate the standard deviation of data that are in multiple
columns. I know how to calculate the STD of 1 column (e.g. X1 of
table_X) using:

SELECT STD(X1) FROM table_X;

but I want to calculate now the STD of the union of data of columns
(e.g. X1, X2, ..., X100 of table_X).

Does anyone has any suggestion on how to do that? I hoped something as
SELECT STD(X1,X2,...,X100) FROM table_X existed, but apparently it does
not.

Thanks for your suggestions in advance!

Kind regards,
Stef

Re: STD of multiple columns

am 14.04.2006 14:35:37 von Stefaan Lhermitte

Thanks for your help!

If I look at my design, it looks like:
(table) id, obs_time1, obs_time2, ..., obs_time100
where:
obs_timeX = observation at "time X"

I have over 2 million records (with different unique id's) for this
table, and I want to create the STD of all observations of 1 id through
time.

If I understand your design well, you suggest to reform it towards:
(table_1) studentid + other info
(table_2) studentid obs_time obs_value
where:
obs_time = "time X" of obs_timeX
obs_value = value of obs_timeX with corresponding "time X"

If I am correct, I will get a very long table_2 since I create 2
millions (ids) *100 records (for every obs_time). Don't I create much
more redundant information then (having only 1 row of obs_timeX per id
and having unique id's)?

I hope i made myself clear?

Thanks again for your help!

Regards
Stef

Re: STD of multiple columns

am 14.04.2006 14:54:49 von Jerry Stuckle

stefaan.lhermitte@agr.kuleuven.ac.be wrote:
> Dear Mysql-ians,
>
> I want to calculate the standard deviation of data that are in multiple
> columns. I know how to calculate the STD of 1 column (e.g. X1 of
> table_X) using:
>
> SELECT STD(X1) FROM table_X;
>
> but I want to calculate now the STD of the union of data of columns
> (e.g. X1, X2, ..., X100 of table_X).
>
> Does anyone has any suggestion on how to do that? I hoped something as
> SELECT STD(X1,X2,...,X100) FROM table_X existed, but apparently it does
> not.
>
> Thanks for your suggestions in advance!
>
> Kind regards,
> Stef
>

Stef,

I don't *think* it's possible from your current design.

Perhaps a redesign is in order. Having 100 columns containing basically the
same information is not a good design.

For instance, in the case of student test scores - you could do something like:

(table) studentid name test1scrore test2score test3score test4score

(Of course there would be more info)

A better design would be:

(table 1) studentid name

(table 2) studentid testid score

Such a design is more versatile - and cures your problem along the way.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: STD of multiple columns

am 14.04.2006 16:03:07 von Jerry Stuckle

stefaan.lhermitte@agr.kuleuven.ac.be wrote:
> Thanks for your help!
>
> If I look at my design, it looks like:
> (table) id, obs_time1, obs_time2, ..., obs_time100
> where:
> obs_timeX = observation at "time X"
>
> I have over 2 million records (with different unique id's) for this
> table, and I want to create the STD of all observations of 1 id through
> time.
>
> If I understand your design well, you suggest to reform it towards:
> (table_1) studentid + other info
> (table_2) studentid obs_time obs_value
> where:
> obs_time = "time X" of obs_timeX
> obs_value = value of obs_timeX with corresponding "time X"
>
> If I am correct, I will get a very long table_2 since I create 2
> millions (ids) *100 records (for every obs_time). Don't I create much
> more redundant information then (having only 1 row of obs_timeX per id
> and having unique id's)?
>
> I hope i made myself clear?
>
> Thanks again for your help!
>
> Regards
> Stef
>

Stef,

Yep, that's exactly what I'm suggesting. Do some reading up on "Database
Normalization" - it can help you understand why this is potentially a better
solution.

And yes, the new table will be quite long. But your existing table is quite
wide! 200M rows (the max you could have) isn't as different than what you have
now - 2M rows with > 100 columns in each row.

Also, as you normalize your tables, you can potentially have more, if the
majority of the fields are filled. But you may also have less, if only a small
number are filled. And normalizing your tables makes things more flexible.



--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: STD of multiple columns

am 14.04.2006 17:32:16 von Stefaan Lhermitte

Thanks for your suggestion! I will try to reorganize my data.

I was thinking of making a query to reorganize my data.
E.g.:
(SELECT id, "name(obs_time1)" AS obs_time, obs_time1 AS obs_value FROM
table_1)
UNION
(SELECT id, "name(obs_time2)" AS obs_time, obs_time2 AS obs_value FROM
table_1)
UNION
.......
UNION
(SELECT id, "name(obs_time100)" AS obs_time, obs_time100 AS obs_value
FROM table_1)
ORDER BY id

with:
"name(obs_timeX)"= the name of my columns I now use to extract
obs_timeX

Hopefully that will work!

Regards,
Stef

Re: STD of multiple columns

am 14.04.2006 22:08:07 von tommaso.gastaldi

Stef,

do you really want to run a union of 100 select on a table with 2
millions records ?!?

Actually I don't see where is the problem, why dont' you just apply the
function std to each single column, select std(v1), std(v2), ... ? Why
do you feel you need a multivariate function?

-tom

stefaan.lhermitte@agr.kuleuven.ac.be ha scritto:

> Thanks for your suggestion! I will try to reorganize my data.
>
> I was thinking of making a query to reorganize my data.
> E.g.:
> (SELECT id, "name(obs_time1)" AS obs_time, obs_time1 AS obs_value FROM
> table_1)
> UNION
> (SELECT id, "name(obs_time2)" AS obs_time, obs_time2 AS obs_value FROM
> table_1)
> UNION
> ......
> UNION
> (SELECT id, "name(obs_time100)" AS obs_time, obs_time100 AS obs_value
> FROM table_1)
> ORDER BY id
>
> with:
> "name(obs_timeX)"= the name of my columns I now use to extract
> obs_timeX
>
> Hopefully that will work!
>
> Regards,
> Stef

Re: STD of multiple columns

am 15.04.2006 04:39:13 von Jerry Stuckle

stefaan.lhermitte@agr.kuleuven.ac.be wrote:
> Thanks for your suggestion! I will try to reorganize my data.
>
> I was thinking of making a query to reorganize my data.
> E.g.:
> (SELECT id, "name(obs_time1)" AS obs_time, obs_time1 AS obs_value FROM
> table_1)
> UNION
> (SELECT id, "name(obs_time2)" AS obs_time, obs_time2 AS obs_value FROM
> table_1)
> UNION
> ......
> UNION
> (SELECT id, "name(obs_time100)" AS obs_time, obs_time100 AS obs_value
> FROM table_1)
> ORDER BY id
>
> with:
> "name(obs_timeX)"= the name of my columns I now use to extract
> obs_timeX
>
> Hopefully that will work!
>
> Regards,
> Stef
>

Stef,

Actually, I think I'd do it in PHP or some other language and let it loop, i.e.

(Assuming you're using a version which can insert from a select statement)

for ($i = 1; $i <= 100; $i++) {
$query = "INSERT INTO newtable (studentid, obs_time, obs_value) " .
"VALUES (SELECT studentid, $i, obs_value" . $i , ") FROM oldtable";
result = mysql_query($query);
if (!$result) {
echo "MySQL Error: " . mysql_error();
break;
}
}

Also, if all of the times don't have values and you don't need to insert them,
you can do this in two queries - select the value; if it's null (or blank) then
you don't need to insert it into the new table.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: STD of multiple columns

am 25.04.2006 17:08:05 von Stefaan Lhermitte

Thanx Jerry. I followed your advice and it worked wonderfully!