Hash performance v. hash size setting

Hash performance v. hash size setting

am 02.09.2010 17:16:57 von Deane.Rothenmaier

This is a multipart message in MIME format.
--===============1434675904==
Content-Type: multipart/alternative;
boundary="=_alternative 0053F38486257792_="

This is a multipart message in MIME format.
--=_alternative 0053F38486257792_=
Content-Type: text/plain; charset="US-ASCII"

Gurus,

This may be merely a philosophical debate, but I have a
performance-related question regarding pre-allocation to hashes. Consider:

##################### Sample code #####################
use strict;
use warnings;
use Devel::Peek;

use constant HKEYS => 128; # Experimentally optimized, quality percentage
= 89.4 for 92 elements

my %hash; keys(%hash) = HKEYS;

# code to load %hash with values...

Dump( \%hash, 1); # To get performance data
##################### Sample code #####################

Now, I know %hash will have 92 elements, and that the hash quality
percentage (HQP) will be 89.4 (these are actual numbers, derived using a
real data set). The question is, can I achieve an increase in actual
performance (reduced runtime) by reducing the value of HKEYS to 64? I've
done some experimenting, and found that the HQP will be the same, and that
increasing HKEYS will actually decrease the quality (increases the HQP).

I'm aware of the way keys(%hash) operates--allocating the next higher
power of two elements (e.g.: for n = 1-16, keys(%hash) = n will allocate
16, n = 17 will get 32).

My established practice has been to keep the value(s) of HKEYS equal to or
greater than the known number of elements for each hash--it's this
practice that I've begun to question. So the question is this: Once I've
determined optimal values (minimum HQP) for HKEYS for my hashes, do I gain
anything in runtime performance if I decrease the pre-allocation below
those values (assuming that I don't get a percentage increase as a
result)?

I'm working with seven small hash data sets, ranging from 14 through 139
elements. I know, it's a small group of small data sets, I'm asking this
mainly for future reference, if/when larger sets come down the line.

Thanks!

Deane Rothenmaier
Programmer/Analyst -- IT-StdCfg
Walgreens Corp.
2 Overlook Point #N5102D
MS 6515
Lincolnshire, IL 60069
224-542-5150

People using flintlock pistols shouldn't fire warning shots. -- Andrew
Borntreger
--=_alternative 0053F38486257792_=
Content-Type: text/html; charset="US-ASCII"



Gurus,



This may be merely a philosophical debate,
but I have a performance-related question regarding pre-allocation to hashes.
Consider:




##################### Sample code #####################

use strict;

use warnings;

use Devel::Peek;



use constant HKEYS => 128;  #
Experimentally optimized, quality percentage = 89.4 for 92 elements




my %hash;  keys(%hash) = HKEYS;



# code to load %hash with values...



Dump( \%hash, 1); # To get performance
data


##################### Sample code #####################



Now, I know %hash will have 92 elements,
and that the hash quality  percentage (HQP) will be 89.4 (these are
actual numbers, derived using a real data set). The question is, can I
achieve an increase in actual performance (reduced runtime) by reducing
the value of HKEYS to 64? I've done some experimenting, and found that
the HQP will be the same, and that increasing HKEYS will actually decrease
the quality (increases the HQP).




I'm aware of the way keys(%hash) operates--allocating
the next higher power of two elements (e.g.: for n = 1-16, keys(%hash)
= n will allocate 16, n = 17 will get 32).




My established practice has been to
keep the value(s) of HKEYS equal to or greater than the known number of
elements for each hash--it's this practice that I've begun to question.
So the question is this: Once I've determined optimal values (minimum HQP)
for HKEYS for my hashes, do I gain anything in runtime performance if I
decrease the pre-allocation below those values (assuming that I don't get
a percentage increase as a result)?




I'm working with seven small hash data
sets, ranging from 14 through 139 elements. I know, it's a small group
of small data sets, I'm asking this mainly for future reference, if/when
larger sets come down the line.




Thanks!



Deane Rothenmaier

Programmer/Analyst -- IT-StdCfg

Walgreens Corp.

2 Overlook Point #N5102D

MS 6515

Lincolnshire, IL 60069

224-542-5150



People using flintlock pistols shouldn't fire warning shots. -- Andrew
Borntreger

--=_alternative 0053F38486257792_=--


--===============1434675904==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============1434675904==--

Re: Hash performance v. hash size setting

am 03.09.2010 12:30:01 von Jenda Krynicky

"Premature optimization is the root of all evil!"

Are you sure you do need to fiddle with these things? Does it make a
measurable difference?

From: Deane.Rothenmaier@walgreens.com
> This may be merely a philosophical debate, but I have a
> performance-related question regarding pre-allocation to hashes.

That's the point. It's preallocation! That is it allows you to
specify the number of keys before you insert them and thus prevent
perl from having to rehash the keys several times as you add them one
at a time. That's where it makes a difference.

#!perl
keys %hh = 64;
for (aa..zz) {$c++; $h{$_}=1; $hh{$_}=1; print "$c : ".%h." - ".%hh."\n";exit if $cnt > 64};
__END__

As you can see the number of used buckets is the same for both hashes
as soon as the number of keys is big enough to force perl to allocate
the same number for the %h hash as you preallocated for %hh.

Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Hash performance v. hash size setting

am 03.09.2010 13:35:29 von Gabor Szabo

On Fri, Sep 3, 2010 at 1:30 PM, Jenda Krynicky wrote:
> "Premature optimization is the root of all evil!"
>
> Are you sure you do need to fiddle with these things? Does it make a
> measurable difference?

I am glad you wrote this.
I wanted to ask back if there is any real performance issue in the code
and if Deane has used any profiler on it to see which part of the code
takes the longest to execute.

>> This may be merely a philosophical debate

yes. I think my philosophy is that within reasonable boundaries first
write maintainable, well tested code that does what it needs to do
and only then check if there is any performance issue.
If there is, then run http://search.cpan.org/dist/Devel-NYTProf/
and see where is the problem.

regards
Gabor

-- =

Gabor Szabo=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0    http://szabgab.com/
Twitter: http://twitter.com/szabgab
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Hash performance v. hash size setting

am 08.09.2010 15:52:32 von Deane.Rothenmaier

This is a multipart message in MIME format.
--===============1679170587==
Content-Type: multipart/alternative;
boundary="=_alternative 004C383F86257798_="

This is a multipart message in MIME format.
--=_alternative 004C383F86257798_=
Content-Type: text/plain; charset="US-ASCII"

>I think my philosophy is that within reasonable boundaries first
>write maintainable, well tested code that does what it needs to do
>and only then check if there is any performance issue.
Agreed, Gabor. The code does "[do] what it needs to do..." which is
maintaining an HTML Web page that reports workstation configuration data,
a rather time-consuming process. Some of the hashes I use change
frequently as new data for table rows comes in (e.g.: new software, new
trusted sites).

The hashes are used to track progress through the file, and to catch new
additions as well as any disappearances. It's because of this that I'm
frequently adjusting the preallocation numbers.

Because the actual process is rather slow--and I'm not sure there's much
to be done to speed things up, I try to pick up what snippets of
performance I can in the setup code, hence my query.

Thanks for your time and considered advice.

Deane Rothenmaier
Programmer/Analyst -- IT-StdCfg
Walgreens Corp.
2 Overlook Point #N5102D
MS 6515
Lincolnshire, IL 60069
224-542-5150

People using flintlock pistols shouldn't fire warning shots. -- Andrew
Borntreger
--=_alternative 004C383F86257798_=
Content-Type: text/html; charset="US-ASCII"



>I think
my philosophy is that within reasonable boundaries first

>write maintainable, well tested code that does what it needs to do

>and only then check if there is any performance issue.


Agreed, Gabor. The code does "[do]
what it needs to do..." which is maintaining an HTML Web page that
reports workstation configuration data, a rather time-consuming process.
Some of the hashes I use change frequently as new data for table rows comes
in (e.g.: new software, new trusted sites).




The hashes are used to track progress
through the file, and to catch new additions as well as any disappearances.
It's because of this that I'm frequently adjusting the preallocation numbers.




Because the actual process is rather
slow--and I'm not sure there's much to be done to speed things up, I try
to pick up what snippets of performance I can in the setup code, hence
my query.




Thanks for your time and considered
advice.




Deane Rothenmaier

Programmer/Analyst -- IT-StdCfg

Walgreens Corp.

2 Overlook Point #N5102D

MS 6515

Lincolnshire, IL 60069

224-542-5150



People using flintlock pistols shouldn't fire warning shots. -- Andrew
Borntreger

--=_alternative 004C383F86257798_=--


--===============1679170587==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============1679170587==--

RE: Hash performance v. hash size setting

am 08.09.2010 16:17:40 von Michael.Ludwig

> Because the actual process is rather slow--and I'm not sure there's
> much to be done to speed things up, I try to pick up what snippets of
> performance I can in the setup code, hence my query.

I'd look for other things first:

* execution model: persistent Perl process, or started anew?
* network access
* disk access
* inter-process communication
* locking

Maybe one of those has a big speed-up potential?!

--
Michael Ludwig
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Hash performance v. hash size setting

am 08.09.2010 16:44:35 von Brian Raven

From: activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of
Deane.Rothenmaier@walgreens.com
Sent: 08 September 2010 14:53
To: Gabor Szabo
Cc: Jenda Krynicky; activeperl-bounces@listserv.ActiveState.com;
activeperl@listserv.activestate.com
Subject: Re: Hash performance v. hash size setting

> Because the actual process is rather slow--and I'm not sure there's
much to be done to speed things up, I try to pick up what snippets of
> performance I can in the setup code, hence my query.

Unfortunately, programmers are notoriously bad at guessing where
performance bottlenecks are. That is why the general recommendation is
to measure (profile and/or benchmark) before attempting to make things
go faster.

while (!performance_acceptable()) {
profile();
my $x = locate_worst_bottleneck();
attempt_to_improve($x);
}

You might want to add 'and worth_the_effort()' to the loop condition.

Apologies, of course, if you have already been through all that.

HTH

--
Brian Raven

Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Hash performance v. hash size setting

am 08.09.2010 17:07:54 von Deane.Rothenmaier

This is a multipart message in MIME format.
--===============0116275227==
Content-Type: multipart/alternative;
boundary="=_alternative 00531F8686257798_="

This is a multipart message in MIME format.
--=_alternative 00531F8686257798_=
Content-Type: text/plain; charset="US-ASCII"

Brian Raven coded:
while ((!performance_acceptable()) && worth_the_effort()) {
profile();
my $x = locate_worst_bottleneck();
attempt_to_improve($x);
}

TOO COOL. I'm going to print that out and stick it up on my cube wall.
(And follow it, too!)

Thanks!

Deane Rothenmaier
Programmer/Analyst -- IT-StdCfg
Walgreens Corp.
2 Overlook Point #N5102D
MS 6515
Lincolnshire, IL 60069
224-542-5150

People using flintlock pistols shouldn't fire warning shots. -- Andrew
Borntreger
--=_alternative 00531F8686257798_=
Content-Type: text/html; charset="US-ASCII"



Brian Raven coded:

while ((!performance_acceptable()) && worth_the_effort())
{

   profile();

   my $x = locate_worst_bottleneck();

   attempt_to_improve($x);

}



TOO COOL. I'm going to print that out
and stick it up on my cube wall. (And follow it, too!)




Thanks!



Deane Rothenmaier

Programmer/Analyst -- IT-StdCfg

Walgreens Corp.

2 Overlook Point #N5102D

MS 6515

Lincolnshire, IL 60069

224-542-5150



People using flintlock pistols shouldn't fire warning shots. -- Andrew
Borntreger

--=_alternative 00531F8686257798_=--


--===============0116275227==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0116275227==--