Re: Sorted Hash<a1ddaad2-89c6-4b01-bd39-2e27e36ffe39@b40g2000prf.googlegroups.com><Xns99F7C

am 01.12.2007 22:29:30 von hjp-usenet2

On 2007-11-30 01:13, A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
> "palexvs@gmail.com" wrote in news:a1ddaad2-89c6-4b01-
> bd39-2e27e36ffe39@b40g2000prf.googlegroups.com:
>> I filled hash and then printed it (sorted by key):
>> my %hs = (
>> key10 => 5,
>> key5 => b,
>> aey9 => 7,
>> )
>> foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
>>
>> key - string ([0-9A-F]{72}), 50K records.
>> How do it more effective?
>
> Well, depends on what you mean by 'more effective'.
>
> If you are talking about speed, note that the only place where you can
> really get any meaningful improvement is IO.

I don't see anything in your post supporting that claim. So lets try it:

print "$_ $hash{$_}\n" for sort keys %hash; 0.217 s
print "$_ $hash{$_}\n" for keys %hash; 0.112 s

(on a 1.86 GHz Core 2 running Linux, writing to a disk file. Writing to
/dev/null is about 0.02 seconds faster)

The unsorted version is almost twice as fast as the sorted version. So
eliminating or improving the sort has about the same potential as
improving I/O. In fact, the second number includes the time for walking
through the hash, which is unavoidable, so the I/O takes less than 50 %
of the total time, making the sort the prime candidate for optimization.

So how could the sort be sped up or eliminated?

Tie::IxHash keeps the hash sorted, but has quite a lot of overhead. On
my machine printing the hash takes about 0.465 seconds, which is worse
than sorting the keys on demand.

Another way would be to use a faster sort method, as Salvador suggested.

Finally, if you either need to print the hash several times or fill it
in sorted order you could just keep the sorted keys around instead of
sorting every time.

hp

--
_ | Peter J. Holzer | It took a genius to create [TeX],
|_|_) | Sysadmin WSR | and it takes a genius to maintain it.
| | | hjp@hjp.at | That's not engineering, that's art.
__/ | http://www.hjp.at/ | -- David Kastrup in comp.text.tex

Re: Sorted Hash<a1ddaad2-89c6-4b01-bd39-2e27e36ffe39@b40g2000prf.googlegroups.com><Xns99F7C

am 01.12.2007 23:49:14 von Ben Morrow

Quoth "Peter J. Holzer" :
>

>
> So how could the sort be sped up or eliminated?
>
> Tie::IxHash keeps the hash sorted, but has quite a lot of overhead. On
> my machine printing the hash takes about 0.465 seconds, which is worse
> than sorting the keys on demand.
>
> Another way would be to use a faster sort method, as Salvador suggested.
>
> Finally, if you either need to print the hash several times or fill it
> in sorted order you could just keep the sorted keys around instead of
> sorting every time.

....which is what Tie::IxHash does, but you'd be avoiding the
inefficencies of tie. Yet another way would be to rewrite Tie::IxHash in
C, or use DB_File/DB_BTREE which effectively does that for you. I don't
know how much you win by having the tie methods in C: that depends on
the efficiency of perl's method cache. It is possible, though probably
not often a good idea, to use tie magic with a different vtable, and
avoid the method lookup altogether. See threads::shared for a (somewhat
incomprehensible) example.

Ben

Re: Sorted Hash <a1ddaad2-89c6-4b01-bd39-2e27e36ffe39@b40g2000prf.googlegroups.com> <Xns99F

am 02.12.2007 06:22:24 von 1usa

"Peter J. Holzer" wrote in
news:slrnfl3khq.k4i.hjp-usenet2@zeno.hjp.at:

> On 2007-11-30 01:13, A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
>> "palexvs@gmail.com" wrote in
>> news:a1ddaad2-89c6-4b01-
>> bd39-2e27e36ffe39@b40g2000prf.googlegroups.com:
>>> I filled hash and then printed it (sorted by key):
>>> my %hs = (
>>> key10 => 5,
>>> key5 => b,
>>> aey9 => 7,
>>> )
>>> foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
>>>
>>> key - string ([0-9A-F]{72}), 50K records.
>>> How do it more effective?
>>
>> Well, depends on what you mean by 'more effective'.
>>
>> If you are talking about speed, note that the only place where you
>> can really get any meaningful improvement is IO.
>
> I don't see anything in your post supporting that claim. So lets try
> it:
>
> print "$_ $hash{$_}\n" for sort keys %hash; 0.217 s
> print "$_ $hash{$_}\n" for keys %hash; 0.112 s

Given the OP's script, the one criterion over which there can be no
doubt is that he wants the output to be sorted.

> The unsorted version is almost twice as fast as the sorted version. So
> eliminating or improving the sort has about the same potential as
> improving I/O.

Clearly, writing a one's own replacement sort routine or copying and
pasting something from somewhere is out of the question. So, the only
way to improve the sort is to pull in a module or two. My intuition told
me that the overhead of that would make the improvement in the context
of this script very underwhelming.

> So how could the sort be sped up or eliminated?
>
> Tie::IxHash keeps the hash sorted, but has quite a lot of overhead. On
> my machine printing the hash takes about 0.465 seconds, which is worse
> than sorting the keys on demand.

Ditto on my system.

> Another way would be to use a faster sort method, as Salvador
> suggested.

I forgot what that was but I tried it at the time and it was slightly
slower.

> Finally, if you either need to print the hash several times or fill it
> in sorted order you could just keep the sorted keys around instead of
> sorting every time.

I meant my response to be critical of the OP's post: There was barely
enough information on requirements in the OP. Given all the information
in the OP regarding the problem and assuming that the OP wanted improve
speed, there is no better way than the simple one liner the OP posted.

Sinan

--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
clpmisc guidelines: