Sort::Maker : anonymous sub is compiled outside of my module

Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 14:25:33 von paduille.4060.mumia.w

I wanted to see how Sort::Maker could be used to sort the "meascodes" in
this message:

http://groups.google.com/groups?selm=1165272390.675288.23467 0@79g2000cws.googlegroups.com

To do this, I wrote this program:

132 use strict;
133 use warnings;
134 use Data::Dumper;
135 use Sort::Maker;
136
137 my $cost_order = 'A2yB';
138 my @sorted;
139
140 my $ms = [
141 '

',
142 '
',
143 '
',
144 '
'
145 ];
146
147 my $sorter = make_sorter(
148 plain => string => sub {
149 if (/meascode='(.)'/) {index($cost_order,$1)}
150 },
151 );
152 if ($@) { print "$@\n"; exit; }
153 @sorted = $sorter->(@$ms);
154 print Dumper(\@sorted);

However, this fails with "Global symbol "$cost_order" requires explicit
package name at (eval 12) line 8."

I was a little surprised, since anonymous subs usually have access to
lexical variables. I decided to make $cost_order a package variable (in
main) and fully qualify it as $::cost_order in the anonymous sub, so I
changed the lines of the program like so:

137 our $cost_order = 'A2yB';
149 if (/meascode='(.)'/) {index($::cost_order,$1)}

However, it fails again with the same error message as before.
Thankfully, it prints out the key extraction code which shows that my
colons have been removed from $::cost_order, and even when I use
$main::cost_order, it gets changed to $cost_order--which causes the failure.

I then decided to change the variable $cost_order to a subroutine:

137 sub COST_ORDER { 'A2yB' }
149 if (/meascode='(.)'/) {index(COST_ORDER(),$1)}

And I get this error message: "Undefined subroutine
&Sort::Maker::COST_ORDER called at (eval 12) line 8."

It looks like my anonymous sub is being compiled in Sort::Maker. No
problem, I'll just put the qualifiers on the COST_ORDER():

149 if (/meascode='(.)'/) {index(::COST_ORDER(),$1)}

But it fails with the same error message. I can only assume that the
colons were stripped from COST_ORDER() before the anonymous sub was
compiled. Finally, I had an insight: Sort::Maker likes stuff to be in
its package, so I try this:

137 sub COST_ORDER { 'A2yB' }
146 *{Sort::Maker::COST_ORDER} = \&COST_ORDER;
149 if (/meascode='(.)'/) {index(COST_ORDER(),$1)}

And it works.

Evidently, there is a bug/undocumented feature that causes an anonymous
sub that the programmer writes in his package to be compiled in
Sort::Maker, and Sort::Maker seems to make some modifications to the
*text* of that subroutine before it's compiled :O

Perl 5.8.4
Sort::Maker 0.05
Debian 3.1 (i686)


--
paduille.4060.mumia.w@earthlink.net

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 17:11:33 von Gunnar Hjalmarsson

Mumia W. (reading news) wrote:
> I wanted to see how Sort::Maker could be used to sort the "meascodes" in
> this message:
>
> http://groups.google.com/groups?selm=1165272390.675288.23467 0@79g2000cws.googlegroups.com

Interesting! I was waiting for Uri to jump in again, but you were first. :)

> To do this, I wrote this program:
>
> 132 use strict;
> 133 use warnings;
> 134 use Data::Dumper;
> 135 use Sort::Maker;
> 136
> 137 my $cost_order = 'A2yB';
> 138 my @sorted;
> 139
> 140 my $ms = [
> 141 '

',
> 142 '
',
> 143 '
',
> 144 '
'
> 145 ];
> 146
> 147 my $sorter = make_sorter(
> 148 plain => string => sub {
> 149 if (/meascode='(.)'/) {index($cost_order,$1)}
> 150 },
> 151 );
> 152 if ($@) { print "$@\n"; exit; }
> 153 @sorted = $sorter->(@$ms);
> 154 print Dumper(\@sorted);
>
> However, this fails with "Global symbol "$cost_order" requires explicit
> package name at (eval 12) line 8."

That error can be fixed by passing the var to the sub:

plain => string => sub {
if ( /meascode='(.)'/ ) { index( $_[0], $1 ) }
}->($cost_order),

But it didn't give the correct output...

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 17:49:37 von unknown

Mumia W. (reading news) wrote:
> I wanted to see how Sort::Maker could be used to sort the "meascodes" in
> this message:
>
> http://groups.google.com/groups?selm=1165272390.675288.23467 0@79g2000cws.googlegroups.com
>
>
> To do this, I wrote this program:
>
> 132 use strict;
> 133 use warnings;
> 134 use Data::Dumper;
> 135 use Sort::Maker;
> 136
> 137 my $cost_order = 'A2yB';
> 138 my @sorted;
> 139
> 140 my $ms = [
> 141 '

',
> 142 '
',
> 143 '
',
> 144 '
'
> 145 ];
> 146
> 147 my $sorter = make_sorter(
> 148 plain => string => sub {
> 149 if (/meascode='(.)'/) {index($cost_order,$1)}
> 150 },
> 151 );
> 152 if ($@) { print "$@\n"; exit; }
> 153 @sorted = $sorter->(@$ms);
> 154 print Dumper(\@sorted);
>
> However, this fails with "Global symbol "$cost_order" requires explicit
> package name at (eval 12) line 8."
>
> I was a little surprised, since anonymous subs usually have access to
> lexical variables. I decided to make $cost_order a package variable (in
> main) and fully qualify it as $::cost_order in the anonymous sub, so I
> changed the lines of the program like so:
>
> 137 our $cost_order = 'A2yB';
> 149 if (/meascode='(.)'/) {index($::cost_order,$1)}
>
> However, it fails again with the same error message as before.
> Thankfully, it prints out the key extraction code which shows that my
> colons have been removed from $::cost_order, and even when I use
> $main::cost_order, it gets changed to $cost_order--which causes the
> failure.
>
> I then decided to change the variable $cost_order to a subroutine:
>
> 137 sub COST_ORDER { 'A2yB' }
> 149 if (/meascode='(.)'/) {index(COST_ORDER(),$1)}
>
> And I get this error message: "Undefined subroutine
> &Sort::Maker::COST_ORDER called at (eval 12) line 8."
>
> It looks like my anonymous sub is being compiled in Sort::Maker. No
> problem, I'll just put the qualifiers on the COST_ORDER():
>
> 149 if (/meascode='(.)'/) {index(::COST_ORDER(),$1)}
>
> But it fails with the same error message. I can only assume that the
> colons were stripped from COST_ORDER() before the anonymous sub was
> compiled. Finally, I had an insight: Sort::Maker likes stuff to be in
> its package, so I try this:
>
> 137 sub COST_ORDER { 'A2yB' }
> 146 *{Sort::Maker::COST_ORDER} = \&COST_ORDER;
> 149 if (/meascode='(.)'/) {index(COST_ORDER(),$1)}
>
> And it works.
>
> Evidently, there is a bug/undocumented feature that causes an anonymous
> sub that the programmer writes in his package to be compiled in
> Sort::Maker, and Sort::Maker seems to make some modifications to the
> *text* of that subroutine before it's compiled :O
>
> Perl 5.8.4
> Sort::Maker 0.05
> Debian 3.1 (i686)
>
>

Well, it's quasi-documented. The third paragraph in "Key Extraction
Code" about 60% of the way down the POD says "If a CODE reference is
used, then the B::Deparse module is used to deparse it back into Perl
source. This source is then used to extract the key in the generated
sorter. As with qr//, the advantage is that the extraction code is
syntax checked at compile time and not runtime. Also the deparsed code
is wrapped in a do{} block so you may use complex code to extract the key."

So it's not, strictly speaking, undocumented, but I admit that I find it
an un-Perlish behaviour. His "the advantage is that the extraction code
is syntax checked at compile time and not runtime" seems a bit
irrelevant; if you have a code reference to pass in the first place, it
seems to me that we already know it compiles.

It would seem to me that the author could have done something like
returning the code reference as-is, and made his sort routine call that.
But then, I'm not writing the code.

Tom Wyant

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 20:38:12 von Uri Guttman

>>>>> "MW(n" == Mumia W (reading news) writes:

MW(n> 137 my $cost_order = 'A2yB';
MW(n> 138 my @sorted;
MW(n> 139
MW(n> 140 my $ms = [
MW(n> 141 '

',
MW(n> 142 '
',
MW(n> 143 '
',
MW(n> 144 '
'
MW(n> 145 ];
MW(n> 146
MW(n> 147 my $sorter = make_sorter(
MW(n> 148 plain => string => sub {
MW(n> 149 if (/meascode='(.)'/) {index($cost_order,$1)}
MW(n> 150 },
MW(n> 151 );
MW(n> 152 if ($@) { print "$@\n"; exit; }
MW(n> 153 @sorted = $sorter->(@$ms);
MW(n> 154 print Dumper(\@sorted);

MW(n> I was a little surprised, since anonymous subs usually have access to
MW(n> lexical variables. I decided to make $cost_order a package variable
MW(n> (in main) and fully qualify it as $::cost_order in the anonymous sub,
MW(n> so I changed the lines of the program like so:

sort::maker builds a sort sub source and evals it. so that is done in
the Sort::Maker namespace as you learned later on. there is no way for
that code to see any lexicals or subs outside its space.

MW(n> 137 sub COST_ORDER { 'A2yB' }
MW(n> 146 *{Sort::Maker::COST_ORDER} = \&COST_ORDER;
MW(n> 149 if (/meascode='(.)'/) {index(COST_ORDER(),$1)}

MW(n> And it works.

MW(n> Evidently, there is a bug/undocumented feature that causes an
MW(n> anonymous sub that the programmer writes in his package to be compiled
MW(n> in Sort::Maker, and Sort::Maker seems to make some modifications to
MW(n> the *text* of that subroutine before it's compiled :O

there is a feature you missed which can help here. the 'init_code'
option allows you to put arbitrary code into the sort sub. this is
useful when you do work in one key extraction and generate multiple
keys. you can declare a temp var in init_code and assign it in one
extraction for use by later keys. in this case it can be used to set the
sort order string. this is a tested example. it prints out the sorter so
you see how the init_code is used.

#!/usr/local/bin/perl

use strict ;
use warnings ;

use Sort::Maker qw( make_sorter sorter_source ) ;

my @unsorted = (

'
',
'
',
'
',
'
',
) ;

my $sorter = make_sorter(
'GRT',
init_code => 'my $cost_order = q{A2yB};',
signed => 1,
string_data => 1,
number => q{ /code="(.)"/ && index($cost_order,$1) },
) ;

$sorter or die $@ ;

print sorter_source( $sorter ) ;

print map "$_\n", $sorter->( @unsorted ) ;


if you want the $cost_order to come from the outside, you can just use a
variable in the init_code and change it to "".

my $cost_order = 'A2yB';

my $sorter = make_sorter(
'GRT',
init_code => "my \$cost_order = '$cost_order' ;",
signed => 1,
string_data => 1,
number => q{ /code="(.)"/ && index($cost_order,$1) },
) ;

both output this:






and that appears to be the desired order.

note that i do a numeric compare on the index() value as a string compare
would fail when you get to more than 1 digit.

also see how i simplified getting the index value in the key extraction.

expressiveness like this is one of the major wins of Sort::Maker over
rolling your own sorts besides the speedup if you use the GRT.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 20:41:04 von Uri Guttman

>>>>> "GH" == Gunnar Hjalmarsson writes:

GH> Mumia W. (reading news) wrote:
>> I wanted to see how Sort::Maker could be used to sort the
>> "meascodes" in this message:
>> http://groups.google.com/groups?selm=1165272390.675288.23467 0@79g2000cws.googlegroups.com

GH> Interesting! I was waiting for Uri to jump in again, but you were first. :)

i did jump in. just took some time.

GH> That error can be fixed by passing the var to the sub:

GH> plain => string => sub {
GH> if ( /meascode='(.)'/ ) { index( $_[0], $1 ) }
GH> }->($cost_order),

GH> But it didn't give the correct output...

that is calling the sub IMMEDIATELY and passing its return value to the
make_sorter sub as the arg of the 'string' option. very bad. when you
are not sure about how Make::Sorter is working, print out the generated
sub with sorter_source. see my other posting for more and the correct
way to do this.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 20:49:11 von Uri Guttman

>>>>> "h[c[n" == harryfmudd [AT] comcast [DOT] net writes:

h[c[n> Well, it's quasi-documented. The third paragraph in "Key Extraction
h[c[n> Code" about 60% of the way down the POD says "If a CODE reference is
h[c[n> used, then the B::Deparse module is used to deparse it back into Perl
h[c[n> source. This source is then used to extract the key in the generated
h[c[n> sorter. As with qr//, the advantage is that the extraction code is
h[c[n> syntax checked at compile time and not runtime. Also the deparsed code
h[c[n> is wrapped in a do{} block so you may use complex code to extract the
h[c[n> key."

you are looking for the wrong feature. see my other posts for the
correct way to use this module here.

h[c[n> So it's not, strictly speaking, undocumented, but I admit that I find
h[c[n> it an un-Perlish behaviour. His "the advantage is that the extraction
h[c[n> code is syntax checked at compile time and not runtime" seems a bit
h[c[n> irrelevant; if you have a code reference to pass in the first place,
h[c[n> it seems to me that we already know it compiles.

you are not interpreting that correctly so it may need a rewrite. the
comparison there is between a a key extraction which is a string of code
vs a code reference (typically an anon sub right there). then the sub
will be compiled at compile time vs much later at runtime when
make_sorter is called. so with an anon sub you will see any syntax
errors immediately vs. in the error output of make_sorter. this is a
nice win and it is due to damian conway asking for it (and showing me
how to deparse a code ref).

h[c[n> It would seem to me that the author could have done something like
h[c[n> returning the code reference as-is, and made his sort routine call
h[c[n> that. But then, I'm not writing the code.

you can't do that with the current module. there would need to be a way
to access the code ref at sorting time and that would need the sorter to
be a closure or use some private hash to track code refs. i may examine
these approaches but they don't offer any speedup or major wins
IMO. using the method i showed in another post solves this problem very
easily.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 20:55:47 von paduille.4060.mumia.w

On 12/05/2006 10:49 AM, harryfmudd [AT] comcast [DOT] net wrote:
> Mumia W. (reading news) wrote:
>> I wanted to see how Sort::Maker could be used to sort the "meascodes"
>> in this message:
>>
>> http://groups.google.com/groups?selm=1165272390.675288.23467 0@79g2000cws.googlegroups.com
>> [...]
>>
>> Evidently, there is a bug/undocumented feature that causes an
>> anonymous sub that the programmer writes in his package to be compiled
>> in Sort::Maker, and Sort::Maker seems to make some modifications to
>> the *text* of that subroutine before it's compiled :O
>> [...]
>
> Well, it's quasi-documented. The third paragraph in "Key Extraction
> Code" about 60% of the way down the POD says "If a CODE reference is
> used, then the B::Deparse module is used to deparse it back into Perl
> source. [...]

Gotcha. With that information, I decided to try a string rather than a
CODE reference:

use strict;
use warnings;
use Data::Dumper;
use Sort::Maker;
no warnings 'once';

our $cost_order = 'A2yB';
my @sorted;

my $ms = [
'

',
'
',
'
',
'
'
];

my $sorter = make_sorter(
plain => string => q{
if (/meascode='(.)'/) {index($::cost_order,$1)}
},
);
if ($@) { print "$@\n"; exit; }
@sorted = $sorter->(@$ms);
print Dumper(\@sorted);

__END__

And it works.

TIMTOWAROUND



--
paduille.4060.mumia.w@earthlink.net

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 21:17:45 von Uri Guttman

>>>>> "MW(n" == Mumia W (reading news) writes:

MW(n> Gotcha. With that information, I decided to try a string rather than a
MW(n> CODE reference:

MW(n> our $cost_order = 'A2yB';
MW(n> my @sorted;


MW(n> my $sorter = make_sorter(
MW(n> plain => string => q{
MW(n> if (/meascode='(.)'/) {index($::cost_order,$1)}
MW(n> },

that is using a global which can be dangerous. see my other post for a
proper solution. also it shows a cleaner bit of key extraction
code. that code uses the last expression of an if block for its value
which is naughty IMO. and if the regex fails what do you do? hell, my
example doesn't even check that too. but it does make that code into a
proper expression with && and not the if block (which is also slower).

MW(n> And it works.

good!

MW(n> TIMTOWAROUND

and some are better than others! :)

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 22:12:40 von paduille.4060.mumia.w

On 12/05/2006 01:38 PM, Uri Guttman wrote:
> [...]
> my $cost_order = 'A2yB';
>
> my $sorter = make_sorter(
> 'GRT',
> init_code => "my \$cost_order = '$cost_order' ;",
> signed => 1,
> string_data => 1,
> number => q{ /code="(.)"/ && index($cost_order,$1) },
> ) ;
> [...]

It's a little more interesting if a CODE reference is used since only a
CODE ref is deparsed:

number => sub { /code="(.)"/ && index($cost_order,$1) },

But it does work the same way.


--
paduille.4060.mumia.w@earthlink.net

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 05.12.2006 23:31:16 von Uri Guttman

>>>>> "MW(n" == Mumia W (reading news) writes:

MW(n> On 12/05/2006 01:38 PM, Uri Guttman wrote:
>> [...]
>> my $cost_order = 'A2yB';
>> my $sorter = make_sorter(
>> 'GRT',
>> init_code => "my \$cost_order = '$cost_order' ;",
>> signed => 1,
>> string_data => 1,
>> number => q{ /code="(.)"/ && index($cost_order,$1) },
>> ) ;
>> [...]

MW(n> It's a little more interesting if a CODE reference is used since only
MW(n> a CODE ref is deparsed:

MW(n> number => sub { /code="(.)"/ && index($cost_order,$1) },

MW(n> But it does work the same way.

did you try that? when i do i get that $cost_order is not declared in
the internal eval. the dumped source code shows that. the reason i
didn't wan't to support just calling the code refs was for speed. the
code ref would need to be called for each sort element and that is a
sizeable extra overhead compared to running the code inline as it does
now. maybe an option can be added to call the coderef instead of
deparsing it so you can use closures for key extractions. shouldn't be
too hard to do. it would be 1 option to parse, pod for it, and a small
conditional and code generation change. wanna tackle it? :)

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Sort::Maker - benchmark

am 06.12.2006 00:07:07 von Gunnar Hjalmarsson

Since this discussion started with this grep() solution:
http://groups.google.com/group/comp.lang.perl.misc/msg/f02de e998cbac6d6

i thought I'd do a benchmark. This is a typical result on my WinXP box
for a 4 elements list:

Rate S-Maker grep()
S-Maker 2620/s -- -83%
grep() 15769/s 502% --

Break even as regards list size (besides the time for compiling
Sort::Maker) seems to be around 20 elements:

Rate grep() S-Maker
grep() 1833/s -- -9%
S-Maker 2014/s 10% --

This is the benchmark code:

use Sort::Maker;
use Benchmark 'cmpthese';

sub build_data {
my @r_order =
map { ('_', 0..9, 'A'..'Z', 'a'..'z')[rand 63] } 1..shift;
my @unsorted =
map qq|

|,
@r_order;
my @c_order;
push @c_order, splice( @r_order, rand @r_order, 1 )
while @r_order;
join( '', @c_order ), @unsorted
}

my ($cost_order, @unsorted) = build_data(20);

cmpthese( -5, {
'S-Maker' => sub {
my $sorter = make_sorter(
'GRT',
init_code => "my \$cost_order = '$cost_order';",
signed => 1,
string_data => 1,
number => q{ /code="(.)"/ && index($cost_order,$1) },
);
my @sorted = $sorter->( @unsorted );
},
'grep()' => sub {
my $co = $cost_order;
my @sorted = ();
while ( my $mc = chop $co ) {
unshift @sorted, grep /code="$mc"/, @unsorted;
}
},
} );

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Sort::Maker - benchmark

am 06.12.2006 00:29:37 von Uri Guttman

>>>>> "GH" == Gunnar Hjalmarsson writes:

GH> Since this discussion started with this grep() solution:
GH> http://groups.google.com/group/comp.lang.perl.misc/msg/f02de e998cbac6d6

GH> i thought I'd do a benchmark. This is a typical result on my WinXP box
GH> for a 4 elements list:

GH> Rate S-Maker grep()
GH> S-Maker 2620/s -- -83%
GH> grep() 15769/s 502% --

GH> Break even as regards list size (besides the time for compiling
GH> Sort::Maker) seems to be around 20 elements:

GH> Rate grep() S-Maker
GH> grep() 1833/s -- -9%
GH> S-Maker 2014/s 10% --

that makes perfect sense. the win with the GRT is with larger input
sets. with only 4 elements the overhead will be slower than the actual
sort time. even with your code's O( N ** 2) growth it still does less
actual work for only 4 elements.


GH> This is the benchmark code:

GH> use Sort::Maker;
GH> use Benchmark 'cmpthese';

GH> sub build_data {
GH> my @r_order =
GH> map { ('_', 0..9, 'A'..'Z', 'a'..'z')[rand 63] } 1..shift;
GH> my @unsorted =
GH> map qq|

|,
GH> @r_order;
GH> my @c_order;
GH> push @c_order, splice( @r_order, rand @r_order, 1 )
GH> while @r_order;
GH> join( '', @c_order ), @unsorted
GH> }

GH> my ($cost_order, @unsorted) = build_data(20);

GH> cmpthese( -5, {
GH> 'S-Maker' => sub {
GH> my $sorter = make_sorter(
GH> 'GRT',
GH> init_code => "my \$cost_order = '$cost_order';",
GH> signed => 1,
GH> string_data => 1,
GH> number => q{ /code="(.)"/ && index($cost_order,$1) },
GH> );
GH> my @sorted = $sorter->( @unsorted );

ahh, you are including the time to build the sort! factor that out (do
it before you call the benchmark) and then in here pass a wrapper sub
that calls the sorter on the data. you can't compare generating and
compiling the sorter sourse to your precompiled code.

this is why benchmarking is a skill unto itself. you have to know what
you are comparing and how to properly isolate the code under comparison.
sort::maker was designed to build a sorter once and reuse it many
times. you can either use the code ref or dump the source and cut/paste
it into your code and use it there. but that will break if i add the
closure feature as you can't paste in the private data from runtime. oh
well.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker - benchmark

am 06.12.2006 00:40:00 von Uri Guttman

>>>>> "UG" == Uri Guttman writes:

UG> ahh, you are including the time to build the sort! factor that out (do
UG> it before you call the benchmark) and then in here pass a wrapper sub
UG> that calls the sorter on the data. you can't compare generating and
UG> compiling the sorter sourse to your precompiled code.

here is the same benchmark with both precompiled sorters and compiling
each time in the benchmark. notice the results.

uri

#!/usr/local/bin/perl

use Sort::Maker;
use Benchmark 'cmpthese';

sub build_data {
my @r_order =
map { ('_', 0..9, 'A'..'Z', 'a'..'z')[rand 63] } 1..shift;
my @unsorted =
map qq|

|,
@r_order;
my @c_order;
push @c_order, splice( @r_order, rand @r_order, 1 )
while @r_order;
join( '', @c_order ), @unsorted
}

my ($cost_order, @unsorted) = build_data(20);

my $sorter = make_sorter(
'GRT',
init_code => "my \$cost_order = '$cost_order';",
signed => 1,
string_data => 1,
number => q{ /code="(.)"/ && index($cost_order,$1) },
);

cmpthese( shift || -5, {
'S-Maker' => sub {
my @sorted = $sorter->( @unsorted );
},

'S-Maker-compiling' => sub {

my $sorter = make_sorter(
'GRT',
init_code => "my \$cost_order = '$cost_order';",
signed => 1,
string_data => 1,
number => q{ /code="(.)"/ && index($cost_order,$1) },
);
my @sorted = $sorter->( @unsorted );
},
'grep()' => sub {
my $co = $cost_order;
my @sorted = ();
while ( my $mc = chop $co ) {
unshift @sorted, grep /code="$mc"/, @unsorted;
}
},
} );


perl clpmodules_bench.pl -1
Rate S-Maker-compiling grep() S-Maker
S-Maker-compiling 232/s -- -14% -61%
grep() 270/s 16% -- -54%
S-Maker 589/s 154% 118% --



--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker - benchmark

am 06.12.2006 01:59:00 von Gunnar Hjalmarsson

Uri Guttman wrote:
>>>>>>"UG" == Uri Guttman writes:
>
> UG> ahh, you are including the time to build the sort! factor that out (do
> UG> it before you call the benchmark) and then in here pass a wrapper sub
> UG> that calls the sorter on the data. you can't compare generating and
> UG> compiling the sorter sourse to your precompiled code.

Yes I can - just did. ;-)

But I see your point. Guess both comparisons might be proper depending
on what it is you want to measure.

> here is the same benchmark with both precompiled sorters and compiling
> each time in the benchmark. notice the results.



> perl clpmodules_bench.pl -1
> Rate S-Maker-compiling grep() S-Maker
> S-Maker-compiling 232/s -- -14% -61%
> grep() 270/s 16% -- -54%
> S-Maker 589/s 154% 118% --

I'm surprised at the speed of the thing. Would have guessed that you'd
need a larger list to benefit from Sort::Maker.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Sort::Maker - benchmark

am 06.12.2006 02:21:47 von Uri Guttman

>>>>> "GH" == Gunnar Hjalmarsson writes:

GH> Uri Guttman wrote:

>> Rate S-Maker-compiling grep() S-Maker
>> S-Maker-compiling 232/s -- -14% -61%
>> grep() 270/s 16% -- -54%
>> S-Maker 589/s 154% 118% --

GH> I'm surprised at the speed of the thing. Would have guessed that you'd
GH> need a larger list to benefit from Sort::Maker.

well, your benchmark has a data size of 20. that means your bubble sort
is doing N ** 2 amount of work (comparisons) which is about 400 (the
index will take about 20/2 loops but the growth is still N ** 2). all of
Sort::Maker's sorts use perl's internal sort which is O( N log N ) so it
will do about 20 * 4.5 (assuming log2 but the log base doesn't matter in
O() analysis) which is 90. so your code is doing 4 times the amount of
work with only 20 elements to sort. note that the real results are not
going to be exactly like the ratios. O() is all about growth rates, not
actual speed. as you noted with a short enough data set your code
wins. when i run the last benchmark with 5 elements i get this:

S-Maker-compiling 289/s -- -80% -87%
grep() 1437/s 398% -- -35%
S-Maker 2195/s 660% 53% --

so you can see the overhead in the generating and compiling the
sorter. and sort::maker used properly still beats out your code! in fact
i can't seem to get yours to win. even at 2 elements it is:

S-Maker-compiling 327/s -- -92% -93%
grep() 3864/s 1082% -- -13%
S-Maker 4443/s 1259% 15% --

notice how yours clobbers the maker which compiles.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 06.12.2006 03:32:18 von paduille.4060.mumia.w

On 12/05/2006 04:31 PM, Uri Guttman wrote:
>>>>>> "MW(n" == Mumia W (reading news) writes:
>
> MW(n> On 12/05/2006 01:38 PM, Uri Guttman wrote:
> >> [...]
> >> my $cost_order = 'A2yB';
> >> my $sorter = make_sorter(
> >> 'GRT',
> >> init_code => "my \$cost_order = '$cost_order' ;",
> >> signed => 1,
> >> string_data => 1,
> >> number => q{ /code="(.)"/ && index($cost_order,$1) },
> >> ) ;
> >> [...]
>
> MW(n> It's a little more interesting if a CODE reference is used since only
> MW(n> a CODE ref is deparsed:
>
> MW(n> number => sub { /code="(.)"/ && index($cost_order,$1) },
>
> MW(n> But it does work the same way.
>
> did you try that?

Yes.

> when i do i get that $cost_order is not declared in
> the internal eval.

It's defined. "My \$cost_order = '$cost_order' ;" defines the variable
in the 'init_code.'

> the dumped source code shows that.

When I run the program, the source dumps itself, and I see where
$cost_order is defined in the internal eval before it's used.

> the reason i
> didn't wan't to support just calling the code refs was for speed. the
> code ref would need to be called for each sort element and that is a
> sizeable extra overhead compared to running the code inline as it does
> now. maybe an option can be added to call the coderef instead of
> deparsing it so you can use closures for key extractions. shouldn't be
> too hard to do. it would be 1 option to parse, pod for it, and a small
> conditional and code generation change. wanna tackle it? :)
>
> uri
>

Maybe I'll look into that if the source for Sort::Maker isn't too
complicated. It sounds easy, but aren't those famous last words? :-)


--
paduille.4060.mumia.w@earthlink.net

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 06.12.2006 05:40:07 von Uri Guttman

>>>>> "MW(n" == Mumia W (reading news) writes:

MW(n> On 12/05/2006 04:31 PM, Uri Guttman wrote:
>>>>>>> "MW(n" == Mumia W (reading news) writes:
>> >> 'GRT',
>> >> init_code => "my \$cost_order = '$cost_order' ;",
>> >> number => q{ /code="(.)"/ && index($cost_order,$1) },
>> >> ) ;
>> >> [...]
>> MW(n> number => sub { /code="(.)"/ && index($cost_order,$1) },

MW(n> Yes.

>> when i do i get that $cost_order is not declared in
>> the internal eval.

MW(n> It's defined. "My \$cost_order = '$cost_order' ;" defines the variable
MW(n> in the 'init_code.'

ah. i didn't see you kept the init_code. i dropped it and tried the pure
closure idea and it failed. i did so a little side test and you can
definitely pass a closure into a string eval so this is doable. you need
to declare a var (or array since more than one can be used) in the eval
text before the sub. those lexicals (could be multiple lexicals with
sequential number names as that will be faster at runtime. good cheat
when generating code).

>> the reason i
>> didn't wan't to support just calling the code refs was for speed. the
>> code ref would need to be called for each sort element and that is a
>> sizeable extra overhead compared to running the code inline as it does
>> now. maybe an option can be added to call the coderef instead of
>> deparsing it so you can use closures for key extractions. shouldn't be
>> too hard to do. it would be 1 option to parse, pod for it, and a small
>> conditional and code generation change. wanna tackle it? :)
>> uri
>>

MW(n> Maybe I'll look into that if the source for Sort::Maker isn't too
MW(n> complicated. It sounds easy, but aren't those famous last words? :-)

the guts of generating the GRT version is the toughest but this is key
extraction which is shared by all four sort styles. i have the design
worked out in my head so if you want to take it on, i can send you some
notes and what needs to be done. it should be fairly simple after
that. you will get full credit for instigating this feature and for
implementing it. of course you need to add test cases as well but the
test setup is table driven so that is easy. email me if you want to take
this on.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: Sort::Maker : anonymous sub is compiled outside of my module

am 06.12.2006 16:23:21 von paduille.4060.mumia.w

On 12/05/2006 10:40 PM, Uri Guttman wrote:
>
> the guts of generating the GRT version is the toughest but this is key
> extraction which is shared by all four sort styles. i have the design
> worked out in my head so if you want to take it on, i can send you some
> notes and what needs to be done. it should be fairly simple after
> that. you will get full credit for instigating this feature and for
> implementing it. of course you need to add test cases as well but the
> test setup is table driven so that is easy. email me if you want to take
> this on.
>
> uri
>

Sure, send me the info. Here is what I've written for the POD for the
proposed feature addition:

> General Options
> [...]
> "keep_code"
>
> This tells Sort::Maker to not deparse the CODE reference you give it
> but instead to use it directly. This probably disables some opportuni-
> ties for optimization of the key extraction code, but it's necessary
> for situtations where your key extraction code must access lexical
> variables.
>

My e-mail address is in my sig and in the headers.

(posted and mailed)
--
paduille.4060.mumia.w@earthlink.net