More math than perl...
am 05.10.2007 23:12:18 von Bill H
Background:
I have a routine I am writing in perl that will give me the median for
a 0 to 5 rating. The ratings are stored in a file and I load the
values into 7 different variables, RATE0 - RATE5 and one called TOTAL.
When a person rates a page I increment one of the RATE variables
based on what they selected (0 - 5) and increment TOTAL so I have a
running count (which is really just the sum of RATE0 - RATE5).
The problem I have (and I hope I am explaining this right), to
calculate a median, I have to make an array that contains all the
values, sorted from low to high, and then look at the value of the
element in the middle to get the median. As an example if I have the
following (not real code, just an example of the logic):
$RATE[0] = 3;
$RATE[1] = 1;
$RATE[2] = 0;
$RATE[3] = 4;
$RATE[4] = 1;
$RATE[5] = 2;
Then my array would be:
@ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
And the median would be $ARRAY[5] or 3. With an even number of
elements in @ARRAY I have to add the value below the middle and the
value above the middle, divide by 2 to get the median.
For a small sample this is no problem, but when the number of people
who have rated it get in to the 1000's this array is going to be too
cumbersome. Does anyone know of a simpler way to do it in perl without
adding in modules or using alot of memory?
Any / all ideas are welcomed, but please remember that the example I
gave is just typed to give you an idea and is not any real code I am
using.
Bill H
Re: More math than perl...
am 05.10.2007 23:44:43 von Charlton Wilbur
>>>>> "BH" == Bill H writes:
BH> The problem I have (and I hope I am explaining this right), to
BH> calculate a median, I have to make an array that contains all
BH> the values, sorted from low to high, and then look at the
BH> value of the element in the middle to get the median.
So, let me see if I understand this right.
You have five variables, $RATE0 through $RATE5, and each contains a
count of how many people rated that page that number?
You could probably do something slick, but the brute-force method
looks like this:
my @array = ((0) x $RATE0, (1) x $RATE1, (2) x $RATE2,
(3) x $RATE3, (4) x $RATE4, (5) x $RATE5);
my $median;
if (@array % 2)
{
$median = ($array[(@array-1)/2] + $array[(@array+1)/2])/2;
}
else
{
$median = $array[@array/2];
}
Alternately, if you did the sensible thing and kept $RATE0 through
$RATE5 in an array, you could say, much more elegantly,
my @array = map { ($_) x $RATE[$_] } (0..5);
Charlton
--
Charlton Wilbur
cwilbur@chromatico.net
Re: More math than perl...
am 05.10.2007 23:49:31 von paduille.4061.mumia.w+nospam
On 10/05/2007 04:12 PM, Bill H wrote:
> [...]
> And the median would be $ARRAY[5] or 3. With an even number of
> elements in @ARRAY I have to add the value below the middle and the
> value above the middle, divide by 2 to get the median.
>
> For a small sample this is no problem, but when the number of people
> who have rated it get in to the 1000's this array is going to be too
> cumbersome. Does anyone know of a simpler way to do it in perl without
> adding in modules or using alot of memory?
> [...]
I would just build the array in memory. On any reasonably modern system,
you'll have to have millions of values before you run out of memory.
I know the mean can be calculated "on the fly"--without storing all of
the values to be examined, but I can't see how this is to be done with
the median; I don't think it's possible.
PS.
I would have given this post a more descriptive subject line like:
calculating median without using too much memory.
Re: More math than perl...
am 05.10.2007 23:51:50 von Dummy
Bill H wrote:
> Background:
>
> I have a routine I am writing in perl that will give me the median for
> a 0 to 5 rating. The ratings are stored in a file and I load the
> values into 7 different variables, RATE0 - RATE5 and one called TOTAL.
> When a person rates a page I increment one of the RATE variables
> based on what they selected (0 - 5) and increment TOTAL so I have a
> running count (which is really just the sum of RATE0 - RATE5).
>
> The problem I have (and I hope I am explaining this right), to
> calculate a median, I have to make an array that contains all the
> values, sorted from low to high, and then look at the value of the
> element in the middle to get the median. As an example if I have the
> following (not real code, just an example of the logic):
>
> $RATE[0] = 3;
> $RATE[1] = 1;
> $RATE[2] = 0;
> $RATE[3] = 4;
> $RATE[4] = 1;
> $RATE[5] = 2;
>
> Then my array would be:
>
> @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
>
> And the median would be $ARRAY[5] or 3. With an even number of
> elements in @ARRAY I have to add the value below the middle and the
> value above the middle, divide by 2 to get the median.
>
> For a small sample this is no problem, but when the number of people
> who have rated it get in to the 1000's this array is going to be too
> cumbersome. Does anyone know of a simpler way to do it in perl without
> adding in modules or using alot of memory?
>
> Any / all ideas are welcomed, but please remember that the example I
> gave is just typed to give you an idea and is not any real code I am
> using.
Perhaps this is close to what you require:
$ perl -le'
my @RATES = ( 3, 1, 0, 4, 1, 2 );
my $TOTAL = 11;
my $half = int( $TOTAL / 2 );
for my $i ( 0 .. $#RATES ) {
if ( ( $half -= $RATES[ $i ] ) < 0 ) {
print "Median = $i";
last;
}
}
'
Median = 3
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Re: More math than perl...
am 05.10.2007 23:53:25 von Jim Gibson
In article <1191618738.361537.183720@o3g2000hsb.googlegroups.com>, Bill
H wrote:
> Background:
>
> I have a routine I am writing in perl that will give me the median for
> a 0 to 5 rating. The ratings are stored in a file and I load the
> values into 7 different variables, RATE0 - RATE5 and one called TOTAL.
> When a person rates a page I increment one of the RATE variables
> based on what they selected (0 - 5) and increment TOTAL so I have a
> running count (which is really just the sum of RATE0 - RATE5).
>
> The problem I have (and I hope I am explaining this right), to
> calculate a median, I have to make an array that contains all the
> values, sorted from low to high, and then look at the value of the
> element in the middle to get the median. As an example if I have the
> following (not real code, just an example of the logic):
>
> $RATE[0] = 3;
> $RATE[1] = 1;
> $RATE[2] = 0;
> $RATE[3] = 4;
> $RATE[4] = 1;
> $RATE[5] = 2;
>
> Then my array would be:
>
> @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
>
> And the median would be $ARRAY[5] or 3. With an even number of
> elements in @ARRAY I have to add the value below the middle and the
> value above the middle, divide by 2 to get the median.
>
> For a small sample this is no problem, but when the number of people
> who have rated it get in to the 1000's this array is going to be too
> cumbersome. Does anyone know of a simpler way to do it in perl without
> adding in modules or using alot of memory?
If you have the counts in an array, you needn't generate an array with
the actual scores. Start adding the counts from 0 to 5. Find the count
element that, when added to the subtotal, makes the subtotal exceed
half of the total count. The score for that array element is the
median.
#!/usr/local/bin/perl
use warnings;
use strict;
my @rate = qw( 3 1 0 4 1 2 );
my $total =0;
$total += $_ for @rate;
my $median = ($total/2);
print "median score is $median of $total\n";
my $subtotal = 0;
for my $r ( 0 .. $#rate ) {
$subtotal += $rate[$r];
if( $subtotal >= $median ) {
print "Median score of (@rate) is $r\n";
last;
}
}
__OUTPUT__
median score is 5.5 of 11
Median score of (3 1 0 4 1 2) is 3
There are some edge cases, such as $subtotal == $median, that are left
as an exercise. :)
--
Jim Gibson
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Re: More math than perl...
am 05.10.2007 23:56:39 von Bill H
On Oct 5, 5:44 pm, Charlton Wilbur wrote:
> >>>>> "BH" == Bill H writes:
>
> BH> The problem I have (and I hope I am explaining this right), to
> BH> calculate a median, I have to make an array that contains all
> BH> the values, sorted from low to high, and then look at the
> BH> value of the element in the middle to get the median.
>
> So, let me see if I understand this right.
>
> You have five variables, $RATE0 through $RATE5, and each contains a
> count of how many people rated that page that number?
>
> You could probably do something slick, but the brute-force method
> looks like this:
>
> my @array = ((0) x $RATE0, (1) x $RATE1, (2) x $RATE2,
> (3) x $RATE3, (4) x $RATE4, (5) x $RATE5);
>
> my $median;
>
> if (@array % 2)
> {
> $median = ($array[(@array-1)/2] + $array[(@array+1)/2])/2;}
>
> else
> {
> $median = $array[@array/2];
>
> }
>
> Alternately, if you did the sensible thing and kept $RATE0 through
> $RATE5 in an array, you could say, much more elegantly,
>
> my @array = map { ($_) x $RATE[$_] } (0..5);
>
> Charlton
>
> --
> Charlton Wilbur
> cwil...@chromatico.net
Thanks Charlton, but would this not still make a large array if the
total number of people is high (unles I am missing something in it).
Bill H
Re: More math than perl...
am 06.10.2007 00:37:00 von xhoster
Bill H wrote:
> Background:
>
> I have a routine I am writing in perl that will give me the median for
> a 0 to 5 rating. The ratings are stored in a file and I load the
> values into 7 different variables, RATE0 - RATE5 and one called TOTAL.
> When a person rates a page I increment one of the RATE variables
> based on what they selected (0 - 5) and increment TOTAL so I have a
> running count (which is really just the sum of RATE0 - RATE5).
>
> The problem I have (and I hope I am explaining this right), to
> calculate a median, I have to make an array that contains all the
> values, sorted from low to high, and then look at the value of the
> element in the middle to get the median. As an example if I have the
> following (not real code, just an example of the logic):
>
> $RATE[0] = 3;
> $RATE[1] = 1;
> $RATE[2] = 0;
> $RATE[3] = 4;
> $RATE[4] = 1;
> $RATE[5] = 2;
Compute the median directly from the structure you already have.
use List::Util qw(sum);
sub median_from_bins {
my ($bins,$total)=@_;
$total=sum @$bins unless defined $total;
my $sofar=0;
for (my $x=0; $x<=5; $x++) {
$sofar+=$bins->[$x];
return $x if $sofar>$total/2;
if ($sofar == $total/2) {
my $y=$x+1;
$y++ until $bins->[$y];
return ($x+$y)/2;
};
die "Should never get here $x $sum $total @$bins";
};
my $median = median_from_bins(\@RATE,$TOTAL);
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
Re: More math than perl...
am 06.10.2007 02:34:26 von Tad McClellan
Bill H wrote:
> On Oct 5, 5:44 pm, Charlton Wilbur wrote:
>> >>>>> "BH" == Bill H writes:
>>
>> BH> The problem I have (and I hope I am explaining this right), to
>> BH> calculate a median, I have to make an array that contains all
>> BH> the values, sorted from low to high, and then look at the
>> BH> value of the element in the middle to get the median.
>> Alternately, if you did the sensible thing and kept $RATE0 through
>> $RATE5 in an array, you could say, much more elegantly,
>>
>> my @array = map { ($_) x $RATE[$_] } (0..5);
>> --
>> Charlton Wilbur
>> cwil...@chromatico.net
[ it is bad 'net manners to quote .sigs ...]
> Thanks Charlton, but would this not still make a large array if the
> total number of people is high (unles I am missing something in it).
How many hundreds of thousands of people do you expect
will take your survey?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Re: More math than perl...
am 06.10.2007 03:40:27 von Peter Jamieson
"Bill H" wrote in message
news:1191618738.361537.183720@o3g2000hsb.googlegroups.com...
> Background:
>
> I have a routine I am writing in perl that will give me the median for
> a 0 to 5 rating. The ratings are stored in a file and I load the
> values into 7 different variables, RATE0 - RATE5 and one called TOTAL.
> When a person rates a page I increment one of the RATE variables
> based on what they selected (0 - 5) and increment TOTAL so I have a
> running count (which is really just the sum of RATE0 - RATE5).
>
> The problem I have (and I hope I am explaining this right), to
> calculate a median, I have to make an array that contains all the
> values, sorted from low to high, and then look at the value of the
> element in the middle to get the median. As an example if I have the
> following (not real code, just an example of the logic):
>
> $RATE[0] = 3;
> $RATE[1] = 1;
> $RATE[2] = 0;
> $RATE[3] = 4;
> $RATE[4] = 1;
> $RATE[5] = 2;
>
> Then my array would be:
>
> @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
>
> And the median would be $ARRAY[5] or 3. With an even number of
> elements in @ARRAY I have to add the value below the middle and the
> value above the middle, divide by 2 to get the median.
>
> For a small sample this is no problem, but when the number of people
> who have rated it get in to the 1000's this array is going to be too
> cumbersome. Does anyone know of a simpler way to do it in perl without
> adding in modules or using alot of memory?
>
> Any / all ideas are welcomed, but please remember that the example I
> gave is just typed to give you an idea and is not any real code I am
> using.
>
> Bill H
>
Bill, If keeping memory use low is a priority and you indeed need the median
of thousands of ratings then for your data you can probably safely use the
mean value
since as your data count increases the mean and median will converge.
INT($mean) could give you a whole number if needed.
Cheers, Peter
Re: More math than perl...
am 06.10.2007 04:55:17 von veatchla
Bill H wrote:
> Background:
>
> I have a routine I am writing in perl that will give me the median for
> a 0 to 5 rating. The ratings are stored in a file and I load the
> values into 7 different variables, RATE0 - RATE5 and one called TOTAL.
> When a person rates a page I increment one of the RATE variables
> based on what they selected (0 - 5) and increment TOTAL so I have a
> running count (which is really just the sum of RATE0 - RATE5).
>
> The problem I have (and I hope I am explaining this right), to
> calculate a median, I have to make an array that contains all the
> values, sorted from low to high, and then look at the value of the
> element in the middle to get the median. As an example if I have the
> following (not real code, just an example of the logic):
>
> $RATE[0] = 3;
> $RATE[1] = 1;
> $RATE[2] = 0;
> $RATE[3] = 4;
> $RATE[4] = 1;
> $RATE[5] = 2;
>
> Then my array would be:
>
> @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
>
> And the median would be $ARRAY[5] or 3. With an even number of
> elements in @ARRAY I have to add the value below the middle and the
> value above the middle, divide by 2 to get the median.
>
> For a small sample this is no problem, but when the number of people
> who have rated it get in to the 1000's this array is going to be too
> cumbersome. Does anyone know of a simpler way to do it in perl without
> adding in modules or using alot of memory?
>
> Any / all ideas are welcomed, but please remember that the example I
> gave is just typed to give you an idea and is not any real code I am
> using.
>
> Bill H
>
use strict;
use warnings;
@ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
# using the same array since you are concerned about memory.
# need to load the array to handle sorting of 2 digit numbers.
@ARRAY = sort map {sprintf "%05d", $_} @ARRAY;
$midPoint = $#ARRAY / 2;
$median = $ARRAY[int $midPoint];
if ($midPoint != int $midPoint) {
$upperPoint = $midPoint +1;
$median = ($median + $ARRAY[int $upperPoint]) / 2;
}
print "median = $median\n";
But this is why I use the Statistics::Descriptive::Discrete module to
calculate medians.
--
Len
Re: More math than perl...
am 06.10.2007 07:56:07 von paduille.4061.mumia.w+nospam
On 10/05/2007 09:55 PM, l v wrote:
> Bill H wrote:
>> [ problem calculating the median without using too much memory ]
>> Bill H
>>
>
>
> use strict;
> use warnings;
> @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
This kind of array is what Bill wanted to avoid creating.
>
> # using the same array since you are concerned about memory.
> # need to load the array to handle sorting of 2 digit numbers.
> @ARRAY = sort map {sprintf "%05d", $_} @ARRAY;
How is that simpler than this?
@ARRAY = sort { $a <=> $b } @ARRAY;
> $midPoint = $#ARRAY / 2;
> $median = $ARRAY[int $midPoint];
>
> if ($midPoint != int $midPoint) {
> $upperPoint = $midPoint +1;
> $median = ($median + $ARRAY[int $upperPoint]) / 2;
> }
>
> print "median = $median\n";
>
use POSIX 'ceil';
print "median = ", $ARRAY[ceil(@ARRAY/2)], "\n";
>
> But this is why I use the Statistics::Descriptive::Discrete module to
> calculate medians.
>
Bill said he didn't want to use any modules.
Re: More math than perl...
am 06.10.2007 12:23:05 von Michele Dondi
On 05 Oct 2007 17:44:43 -0400, Charlton Wilbur
wrote:
>if (@array % 2)
>{
> $median = ($array[(@array-1)/2] + $array[(@array+1)/2])/2;
>}
>else
>{
> $median = $array[@array/2];
>}
Actually, AIUI the index of latter should be (@array-1)/2 (or
$#array/2) and the two should calculations should be swapped. Of
course, this is IMHO a good place where to use the ternary conditional
operator.[*]
[*] On a second thought, do "of course" and "IMHO" clash?
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Re: More math than perl...
am 06.10.2007 13:01:12 von Bill H
On Oct 6, 1:56 am, "Mumia W."
+nos...@earthlink.net> wrote:
> On 10/05/2007 09:55 PM, l v wrote:
>
> > Bill H wrote:
> >> [ problem calculating the median without using too much memory ]
> >> Bill H
>
> > use strict;
> > use warnings;
> > @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
>
> This kind of array is what Bill wanted to avoid creating.
>
>
>
> > # using the same array since you are concerned about memory.
> > # need to load the array to handle sorting of 2 digit numbers.
> > @ARRAY = sort map {sprintf "%05d", $_} @ARRAY;
>
> How is that simpler than this?
>
> @ARRAY = sort { $a <=> $b } @ARRAY;
>
> > $midPoint = $#ARRAY / 2;
> > $median = $ARRAY[int $midPoint];
>
> > if ($midPoint != int $midPoint) {
> > $upperPoint = $midPoint +1;
> > $median = ($median + $ARRAY[int $upperPoint]) / 2;
> > }
>
> > print "median = $median\n";
>
> use POSIX 'ceil';
> print "median = ", $ARRAY[ceil(@ARRAY/2)], "\n";
>
>
>
> > But this is why I use the Statistics::Descriptive::Discrete module to
> > calculate medians.
>
> Bill said he didn't want to use any modules.
Thanks for the help guys. I ended up using a combination of the
examples given:
sub getMedian
{
my @values = @_;
my @median = map { ($_) x $values[$_] } (0..5);
my $m = int(@median / 2);
if ($m != @median / 2)
{
$m = int(($median[$m] + $median[$m + 1]) / 2);
}
else
{
$m = $median[$m];
}
return ($m);
}
where I call it with:
$median = getMedian(@RATE);
I do end up creating the array, but I think it will be ok.
Bill H
Re: More math than perl...
am 06.10.2007 15:06:21 von Bill H
On Oct 6, 7:01 am, Bill H wrote:
> On Oct 6, 1:56 am, "Mumia W."
>
>
>
>
>
> +nos...@earthlink.net> wrote:
> > On 10/05/2007 09:55 PM, l v wrote:
>
> > > Bill H wrote:
> > >> [ problem calculating the median without using too much memory ]
> > >> Bill H
>
> > > use strict;
> > > use warnings;
> > > @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
>
> > This kind of array is what Bill wanted to avoid creating.
>
> > > # using the same array since you are concerned about memory.
> > > # need to load the array to handle sorting of 2 digit numbers.
> > > @ARRAY = sort map {sprintf "%05d", $_} @ARRAY;
>
> > How is that simpler than this?
>
> > @ARRAY = sort { $a <=> $b } @ARRAY;
>
> > > $midPoint = $#ARRAY / 2;
> > > $median = $ARRAY[int $midPoint];
>
> > > if ($midPoint != int $midPoint) {
> > > $upperPoint = $midPoint +1;
> > > $median = ($median + $ARRAY[int $upperPoint]) / 2;
> > > }
>
> > > print "median = $median\n";
>
> > use POSIX 'ceil';
> > print "median = ", $ARRAY[ceil(@ARRAY/2)], "\n";
>
> > > But this is why I use the Statistics::Descriptive::Discrete module to
> > > calculate medians.
>
> > Bill said he didn't want to use any modules.
>
> Thanks for the help guys. I ended up using a combination of the
> examples given:
>
> sub getMedian
> {
> my @values = @_;
> my @median = map { ($_) x $values[$_] } (0..5);
> my $m = int(@median / 2);
> if ($m != @median / 2)
> {
> $m = int(($median[$m] + $median[$m + 1]) / 2);
> }
> else
> {
> $m = $median[$m];
> }
> return ($m);
>
> }
>
> where I call it with:
>
> $median = getMedian(@RATE);
>
> I do end up creating the array, but I think it will be ok.
>
> Bill H- Hide quoted text -
>
> - Show quoted text -
After playing with it for awhile I wonder if median is what I really
need. Logically, if you have 60 people rate the page at 0 and 30
people rate it at 5 then the page rating should be somewhere between 1
and 2, but using a median it would still be ranked at a 0 (middle
element in the array would be a 0). I know it aint strictly perl, but
any thoughts?
Bill H
Re: More math than perl...
am 06.10.2007 15:31:24 von QoS
Bill H wrote in message-id: <1191675981.873467.299980@w3g2000hsg.googlegroups.com>
>
> On Oct 6, 7:01 am, Bill H wrote:
> > On Oct 6, 1:56 am, "Mumia W."
> >
> >
> >
> >
> >
> > +nos...@earthlink.net> wrote:
> > > On 10/05/2007 09:55 PM, l v wrote:
> >
> > > > Bill H wrote:
> > > >> [ problem calculating the median without using too much memory ]
> > > >> Bill H
> >
> > > > use strict;
> > > > use warnings;
> > > > @ARRAY = (0,0,0,1,3,3,3,3,4,5,5);
> >
> > > This kind of array is what Bill wanted to avoid creating.
> >
> > > > # using the same array since you are concerned about memory.
> > > > # need to load the array to handle sorting of 2 digit numbers.
> > > > @ARRAY = sort map {sprintf "%05d", $_} @ARRAY;
> >
> > > How is that simpler than this?
> >
> > > @ARRAY = sort { $a <=> $b } @ARRAY;
> >
> > > > $midPoint = $#ARRAY / 2;
> > > > $median = $ARRAY[int $midPoint];
> >
> > > > if ($midPoint != int $midPoint) {
> > > > $upperPoint = $midPoint +1;
> > > > $median = ($median + $ARRAY[int $upperPoint]) / 2;
> > > > }
> >
> > > > print "median = $median\n";
> >
> > > use POSIX 'ceil';
> > > print "median = ", $ARRAY[ceil(@ARRAY/2)], "\n";
> >
> > > > But this is why I use the Statistics::Descriptive::Discrete module to
> > > > calculate medians.
> >
> > > Bill said he didn't want to use any modules.
> >
> > Thanks for the help guys. I ended up using a combination of the
> > examples given:
> >
> > sub getMedian
> > {
> > my @values = @_;
> > my @median = map { ($_) x $values[$_] } (0..5);
> > my $m = int(@median / 2);
> > if ($m != @median / 2)
> > {
> > $m = int(($median[$m] + $median[$m + 1]) / 2);
> > }
> > else
> > {
> > $m = $median[$m];
> > }
> > return ($m);
> >
> > }
> >
> > where I call it with:
> >
> > $median = getMedian(@RATE);
> >
> > I do end up creating the array, but I think it will be ok.
> >
> > Bill H- Hide quoted text -
> >
> > - Show quoted text -
>
> After playing with it for awhile I wonder if median is what I really
> need. Logically, if you have 60 people rate the page at 0 and 30
> people rate it at 5 then the page rating should be somewhere between 1
> and 2, but using a median it would still be ranked at a 0 (middle
> element in the array would be a 0). I know it aint strictly perl, but
> any thoughts?
>
> Bill H
Perhaps then you need the average; first create a total for each of
your possible values and then add these totals together, then divide
by six for the average.
Here is a simple example which needs some hardening but may show the
concept fairly clearly.
#!/usr/bin/perl/
use strict;
use warnings;
my @values = (2, 4, 3, 9, 4, 16);
my $total = 0;
my $average;
foreach my $i (0..5) {
$total += $values[$i] || 0;
}
$average = $total / 6;
print "The average response is: [$average]\n";
Re: More math than perl...
am 06.10.2007 16:05:52 von spambait
In article <1191675981.873467.299980@w3g2000hsg.googlegroups.com>, Bill H wrote:
>After playing with it for awhile I wonder if median is what I really
>need.
Probably not.
>Logically, if you have 60 people rate the page at 0 and 30
>people rate it at 5 then the page rating should be somewhere between 1
>and 2,
((60 * 0) + (30 * 5)) / (60 + 30) = 150/90 = 1.667
>but using a median it would still be ranked at a 0 (middle
>element in the array would be a 0). I know it aint strictly perl, but
>any thoughts?
Use the mean instead. Or you could display a full statistical report: mean,
median, mode, and standard deviation. :-)
--
Regards,
Doug Miller (alphageek at milmac dot com)
It's time to throw all their damned tea in the harbor again.
Re: More math than perl...
am 06.10.2007 17:59:16 von Michele Dondi
On Fri, 05 Oct 2007 16:49:31 -0500, "Mumia W."
wrote:
>I know the mean can be calculated "on the fly"--without storing all of
>the values to be examined, but I can't see how this is to be done with
>the median; I don't think it's possible.
Sure it is possible:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util 'sum';
use constant TESTS => 20;
use Test::More tests => TESTS;
sub naive {
my @arr = map +($_) x $_[$_], 0..$#_;
@arr % 2 ?
@arr[(@arr-1)/2] :
(@arr[@arr/2 - 1] + @arr[@arr/2])/2;
}
sub findidx {
my $i=shift;
($i -= $_[$_])<0 and return $_ for 0..$#_;
}
sub smart {
my $t=sum @_;
$t%2 ?
findidx +($t-1)/2, @_ :
(findidx($t/2-1, @_) + findidx($t/2, @_))/2;
}
for (1..TESTS) {
my @a=map int rand 10, 0..5;
is smart(@a), naive(@a), "Test @a";
}
__END__
Note: it is to be noted here that smart() is not very smart because I
feel the calculations performed by (findidx($t/2-1, @_) and
findidx($t/2, @_) are very much the same, but in the first attempt
with no helper sub I always got some failing error, so this one
however bloated at least shows as a proff of concept that it is not
necessary to go brute force.
>PS.
>I would have given this post a more descriptive subject line like:
>calculating median without using too much memory.
Seconded.
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,