This is a multipart message in MIME format.
--===============1980735802==
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0A61_01C8359F.852C6310"
Content-Language: en-ca
This is a multipart message in MIME format.
------=_NextPart_000_0A61_01C8359F.852C6310
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
You are misreading the documentation. n + N(n-1)/2k is the formula for expected comparisons for a random hash where each bucket is
filled to the same level. The total number of comparisons for the actual hash is equal to the sum of the squares of the number of
entries in each bucket.
>From looking at the code, I wonder if the numbers are correct though, as the ratio seems to be reversed and therefore should be less
than 100%. I don't have time to investigate this though. The actual code lives in dump.c in the core Perl distribution:
if (HvARRAY(sv) && HvKEYS(sv)) {
/* Show distribution of HEs in the ARRAY */
int freq[200];
#define FREQ_MAX ((int)(sizeof freq / sizeof freq[0] - 1))
int i;
int max = 0;
U32 pow2 = 2, keys = HvKEYS(sv);
NV theoret, sum = 0;
PerlIO_printf(file, " (");
Zero(freq, FREQ_MAX + 1, int);
for (i = 0; (STRLEN)i <= HvMAX(sv); i++) {
HE* h;
int count = 0;
for (h = HvARRAY(sv)[i]; h; h = HeNEXT(h))
count++;
if (count > FREQ_MAX)
count = FREQ_MAX;
freq[count]++;
if (max < count)
max = count;
}
for (i = 0; i <= max; i++) {
if (freq[i]) {
PerlIO_printf(file, "%d%s:%d", i,
(i == FREQ_MAX) ? "+" : "",
freq[i]);
if (i != max)
PerlIO_printf(file, ", ");
}
}
PerlIO_putc(file, ')');
/* The "quality" of a hash is defined as the total number of
comparisons needed to access every element once, relative
to the expected number needed for a random hash.
The total number of comparisons is equal to the sum of
the squares of the number of entries in each bucket.
For a random hash of n keys into k buckets, the expected
value is
n + n(n-1)/2k
*/
for (i = max; i > 0; i--) { /* Precision: count down. */
sum += freq[i] * i * i;
}
while ((keys = keys >> 1))
pow2 = pow2 << 1;
theoret = HvKEYS(sv);
theoret += theoret * (theoret-1)/pow2;
PerlIO_putc(file, '\n');
Perl_dump_indent(aTHX_ level, file, " hash quality = %.1"NVff"%%", theoret/sum*100);
}
Cheers,
-Jan
From: activeperl-bounces@listserv.ActiveState.com [mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of
Deane.Rothenmaier@walgreens.com
Sent: December 3, 2007 10:28 AM
To: activeperl@listserv.ActiveState.com
Subject: "Quality Percentage" of hashes as reported by Devel::Peek
Gurus,
I tried directing this query to the module's author, but the e-mail address as listed in the module's documentation is no longer
valid--not surprising, after nearly a decade, I guess. Anyhoo, I thought I'd ask if any of you folks might know the answer...
In the documentation for this module's Dump subroutine, a hash's "quality" is described as a ratio of "total" comparisons versus
"expected" comparisons. There is a formula given for total comparisons, n + n(n-1)/2k, but no formula is given to determine expected
comparisons. The question is then, how can a user determine mathematically what an expected quality percentage might be?
I know that Dump outputs the percentage, I'd just like to be able to calculate it predictively, instead of fiddling keys(%hash) = X
assignments and TIAS-ing for differing values of 'X.'
Is there a formula for expected comparisons available?
Thanks,
Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150
A word to the wise ain't necessary, it's the stupid ones who need the advice. -- Bill Cosby
------=_NextPart_000_0A61_01C8359F.852C6310
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>You are misreading the documentation. n + N(n-1)/2k =
is the
formula for expected comparisons for a random hash where each bucket is =
filled
to the same level. The total number of comparisons for the actual =
hash is
equal to the sum of the squares of the number of entries in each =
bucket.
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>
style=3D'font-size:11.0pt;
font-family:"Calibri","sans-serif";color:#1F497D'>From looking at the =
code, I
wonder if the numbers are correct though, as the ratio seems to be =
reversed and
therefore should be less than 100%. I don’t have time to =
investigate this
though. The actual code lives in dump.c in the core Perl =
distribution:
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> if =
(HvARRAY(sv) && HvKEYS(sv)) {
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; /* Show distribution of HEs in the ARRAY =
*/
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; int freq[200];
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'>#define FREQ_MAX ((int)(sizeof freq / sizeof freq[0] - =
1))
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; int i;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; int max =3D 0;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; U32 pow2 =3D 2, keys =3D HvKEYS(sv);
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; NV theoret, sum =3D 0;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'>
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; PerlIO_printf(file, " =
(");
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; Zero(freq, FREQ_MAX + 1, int);
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> =
for (i =3D 0; (STRLEN)i <=3D =
HvMAX(sv); i++) {
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; HE* h;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; int count =3D 0;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; for (h =3D HvARRAY(sv)[i]; h; h =3D =
HeNEXT(h))
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; =
count++;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; if (count > =
FREQ_MAX)
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; count =3D =
FREQ_MAX;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; freq[count]++;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; if (max < =
count)
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; max =3D =
count;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; }
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; for (i =3D 0; i <=3D max; i++) {
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; if (freq[i]) {
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; =
PerlIO_printf(file, "%d%s:%d", i,
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p;  =
; (i =
== FREQ_MAX) ?
"+" : "",
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p;  =
; =
freq[i]);
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; if (i !=3D =
max)
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p;  =
; PerlIO_printf(file, ", ");
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; }
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; }
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; PerlIO_putc(file, ')');
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; /* The "quality" of a hash is defined as
the total number of
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; comparisons needed to access every element =
once,
relative
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; to the expected number needed for a random =
hash.
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'>
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; The total number of comparisons is equal to =
the
sum of
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; the squares of the number of entries in each
bucket.
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; For a random hash of n keys into k buckets, =
the
expected
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; value is
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p;  =
; n + =
n(n-1)/2k
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; */
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'>
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; for (i =3D max; i > 0; i--) { /* Precision: count
down. */
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; sum +=3D freq[i] * i * =
i;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; }
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; while ((keys =3D keys >> 1))
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; pow2 =3D pow2 << =
1;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; theoret =3D HvKEYS(sv);
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; theoret +=3D theoret * (theoret-1)/pow2;
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; PerlIO_putc(file, '\n');
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> &nbs=
p; Perl_dump_indent(aTHX_ level, file, " hash
quality =3D %.1"NVff"%%", =
theoret/sum*100);
style=3D'font-size:11.0pt;font-family:"Courier New";
color:#1F497D'> =
}
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>Cheers,
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>-Jan
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif" ;
color:#1F497D'>
0cm 4.0pt'>
0cm 0cm 0cm'>
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"' >From:=
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"' >
activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of =
Deane.Rothenmaier@walgreens.com
Sent: December 3, 2007 10:28 AM
To: activeperl@listserv.ActiveState.com
Subject: "Quality Percentage" of hashes as reported by
Devel::Peek