Accomodate for poor db design using Sort::Maker?

Accomodate for poor db design using Sort::Maker?

am 09.12.2006 06:48:56 von Jake Peavy

Hi all, topical question here regarding Sort::Maker

I have a poorly designed db table over which I have no control. The
data isn't normalized but for each row I'd like to return the column
numbers which have the highest number of counts, and the highest column
number with any counts at all. I refer to these values as mode and max
respectively, but please refer to my code and it's output below for a
better understanding. (I have provided sample input data and mimicked
the fetchrow_hashref function provided by DBI)

Both sorts seem klunky, and after the discussion about the efficiency
gains given by Sort::Maker in another thread I'd like to take advantage
if I can (the sorts must be performed for each row returned by the
database).

I'm not sure how to pass in the information in the $hashref to
make_sorter. Advice would be greatly appreciated.

TIA,
-jp

Code:
#!/usr/bin/perl

use strict;
use warnings;

print "id\tmode\tmax\n";
while ( my $hashref = fetchrow_hashref() ) {
my $id = $hashref->{id};

# mode = col number with greatest count
my $mode = ( map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];

# max = highest col number with any count
my $max = ( sort { $b <=> $a }
map { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];

$max = 1 if $max eq ''; #all zeros

print "$id\t$mode\t$max\n";
}

{ my @columns;

sub fetchrow_hashref {

# silence warning that $line may be equal to 0 below
no warnings 'misc';
my $line = or return;

if ( $. == 1 ) {
@columns = split ' ', substr $line, 1;
return fetchrow_hashref();
}

my %hash;
@hash{ @columns } = split ' ', $line;

return \%hash;
} }

__DATA__
#id col1 col2 col3 col4 col5
1 2 8 8 0 0
2 5 0 0 0 0
3 3 2 8 2 5
4 4 3 8 2 0
5 5 7 7 0 0
6 9 5 6 0 0
7 6 4 5 7 0
8 2 3 0 0 0
9 9 2 3 3 5
10 0 4 1 0 0
11 0 0 0 0 0
12 1 1 1 1 1

Output:
id mode max
1 2 3
2 1 1
3 3 5
4 3 4
5 2 3
6 1 3
7 4 4
8 2 2
9 1 5
10 2 3
11 1 1
12 1 5

Re: Accomodate for poor db design using Sort::Maker?

am 09.12.2006 18:05:12 von someone

DJ Stunks wrote:
> Hi all, topical question here regarding Sort::Maker
>
> I have a poorly designed db table over which I have no control. The
> data isn't normalized but for each row I'd like to return the column
> numbers which have the highest number of counts, and the highest column
> number with any counts at all. I refer to these values as mode and max
> respectively, but please refer to my code and it's output below for a
> better understanding. (I have provided sample input data and mimicked
> the fetchrow_hashref function provided by DBI)
>
> Both sorts seem klunky, and after the discussion about the efficiency
> gains given by Sort::Maker in another thread I'd like to take advantage
> if I can (the sorts must be performed for each row returned by the
> database).
>
> I'm not sure how to pass in the information in the $hashref to
> make_sorter. Advice would be greatly appreciated.
>
> TIA,
> -jp
>
> Code:
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> print "id\tmode\tmax\n";
> while ( my $hashref = fetchrow_hashref() ) {
> my $id = $hashref->{id};
>
> # mode = col number with greatest count
> my $mode = ( map { $_->[0] }
> sort { $b->[1] <=> $a->[1] }
> map { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];
>
> # max = highest col number with any count
> my $max = ( sort { $b <=> $a }
> map { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];
>
> $max = 1 if $max eq ''; #all zeros


If you want efficiency then you should just use loops instead of sorting:


my ( $mode, $curr ) = ( 1, 0 );
for ( 1 .. 5 ) {
if ( $hashref->{ "col$_" } > $curr ) {
$curr = $hashref->{ "col$_" };
$mode = $_;
}
}

my $max = 1;
for ( reverse 1 .. 5 ) {
if ( $hashref->{ "col$_" } > 0 ) {
$max = $_;
last;
}
}



John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall

Re: Accomodate for poor db design using Sort::Maker?

am 09.12.2006 20:42:06 von unknown

John W. Krahn wrote:
> DJ Stunks wrote:
>
>>Hi all, topical question here regarding Sort::Maker
>>
>>I have a poorly designed db table over which I have no control. The
>>data isn't normalized but for each row I'd like to return the column
>>numbers which have the highest number of counts, and the highest column
>>number with any counts at all. I refer to these values as mode and max
>>respectively, but please refer to my code and it's output below for a
>>better understanding. (I have provided sample input data and mimicked
>>the fetchrow_hashref function provided by DBI)
>>
>>Both sorts seem klunky, and after the discussion about the efficiency
>>gains given by Sort::Maker in another thread I'd like to take advantage
>>if I can (the sorts must be performed for each row returned by the
>>database).
>>
>>I'm not sure how to pass in the information in the $hashref to
>>make_sorter. Advice would be greatly appreciated.
>>
>>TIA,
>>-jp
>>
>>Code:
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> print "id\tmode\tmax\n";
>> while ( my $hashref = fetchrow_hashref() ) {
>> my $id = $hashref->{id};
>>
>> # mode = col number with greatest count
>> my $mode = ( map { $_->[0] }
>> sort { $b->[1] <=> $a->[1] }
>> map { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];
>>
>> # max = highest col number with any count
>> my $max = ( sort { $b <=> $a }
>> map { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];
>>
>> $max = 1 if $max eq ''; #all zeros
>
>
>
> If you want efficiency then you should just use loops instead of sorting:
>
>
> my ( $mode, $curr ) = ( 1, 0 );
> for ( 1 .. 5 ) {
> if ( $hashref->{ "col$_" } > $curr ) {
> $curr = $hashref->{ "col$_" };
> $mode = $_;
> }
> }
>
> my $max = 1;
> for ( reverse 1 .. 5 ) {
> if ( $hashref->{ "col$_" } > 0 ) {
> $max = $_;
> last;
> }
> }
>
>
>
> John

Have you looked at List::Util? It's a standard module in Perl 5.8, and
contains a max function. So the first loop becomes something like

use List::Util qw{max};

my $mode = max (map {$hashref->{"col$_"}} 1..5);

Assuming the only zeros are trailing, the second loop could be

my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5

since the value of a list in scalar context is the number of elements in
the list. If you have zeros embedded, this won't give the right answer,
since what it really does is find the number of values > 0, not the
highest index with a value > 0.

Of course, for either of these you might want to make the hashref into a
real list, to avoid mapping twice.

Tom Wyant

Re: Accomodate for poor db design using Sort::Maker?

am 10.12.2006 08:09:25 von someone

harryfmudd [AT] comcast [DOT] net wrote:
>
> Have you looked at List::Util? It's a standard module in Perl 5.8, and
> contains a max function. So the first loop becomes something like
>
> use List::Util qw{max};
>
> my $mode = max (map {$hashref->{"col$_"}} 1..5);
>
> Assuming the only zeros are trailing, the second loop could be
>
> my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5
>
> since the value of a list in scalar context is the number of elements in
> the list. If you have zeros embedded, this won't give the right answer,
> since what it really does is find the number of values > 0, not the
> highest index with a value > 0.
>
> Of course, for either of these you might want to make the hashref into a
> real list, to avoid mapping twice.

The OP's code produces:

id mode max
1 2 3
2 1 1
3 3 5
4 3 4
5 2 3
6 1 3
7 4 4
8 2 2
9 1 5
10 2 3
11 1 1
12 1 5

But your's produces:

id mode max
1 8 3
2 5 1
3 8 5
4 8 4
5 7 3
6 9 3
7 7 4
8 3 2
9 9 5
10 4 2
11 0 0
12 1 5

So neither $mode nor $max is correct for all lines.




John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall

Re: Accomodate for poor db design using Sort::Maker?

am 10.12.2006 18:15:29 von Jake Peavy

John W. Krahn wrote:
> DJ Stunks wrote:
> > for each row I'd like to return the column
> > numbers which have the highest number of counts, and the highest column
> > number with any counts at all. I refer to these values as mode and max
> > respectively

> > print "id\tmode\tmax\n";
> > while ( my $hashref = fetchrow_hashref() ) {
> > my $id = $hashref->{id};
> >
> > # mode = col number with greatest count
> > my $mode = ( map { $_->[0] }
> > sort { $b->[1] <=> $a->[1] }
> > map { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];
> >
> > # max = highest col number with any count
> > my $max = ( sort { $b <=> $a }
> > map { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];
> >
> > $max = 1 if $max eq ''; #all zeros
>
>
> If you want efficiency then you should just use loops instead of sorting:
>
>
> my ( $mode, $curr ) = ( 1, 0 );
> for ( 1 .. 5 ) {
> if ( $hashref->{ "col$_" } > $curr ) {
> $curr = $hashref->{ "col$_" };
> $mode = $_;
> }
> }
>
> my $max = 1;
> for ( reverse 1 .. 5 ) {
> if ( $hashref->{ "col$_" } > 0 ) {
> $max = $_;
> last;
> }
> }

Thanks John, I knew something was nagging me about my approach, but
once you're in a mindset sometimes it's hard to break out. Your right
of course, why sort all the elements when I'm only interested in one.

I altered your two loops to return both numbers in one loop (in
actuality there are 60 columns, not just 5, and the $mode and $max
usually occur below 30):

my ( $max, $mode, $curr ) = ( 1,1,0 );
for my $i ( 1 .. 5 ) {
my $count = $hashref->{"col$i"};

if ( $count > $curr ) {
$curr = $count;
$mode = $i;
}

if ( $count > 0 ) {
$max = $i;
}
}

Thanks again,
-jp

Re: Accomodate for poor db design using Sort::Maker?

am 10.12.2006 18:22:07 von Jake Peavy

harryfmudd [AT] comcast [DOT] net wrote:
> John W. Krahn wrote:
> > DJ Stunks wrote:
> >>for each row I'd like to return the column
> >>numbers which have the highest number of counts, and the highest column
> >>number with any counts at all. I refer to these values as mode and max
> >>respectively
> >>
> >> print "id\tmode\tmax\n";
> >> while ( my $hashref = fetchrow_hashref() ) {
> >> my $id = $hashref->{id};
> >>
> >> # mode = col number with greatest count
> >> my $mode = ( map { $_->[0] }
> >> sort { $b->[1] <=> $a->[1] }
> >> map { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];
> >>
> >> # max = highest col number with any count
> >> my $max = ( sort { $b <=> $a }
> >> map { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];
> >>
> >> $max = 1 if $max eq ''; #all zeros
> >
> >
> >
> > If you want efficiency then you should just use loops instead of sorting:
> >
> >
> > my ( $mode, $curr ) = ( 1, 0 );
> > for ( 1 .. 5 ) {
> > if ( $hashref->{ "col$_" } > $curr ) {
> > $curr = $hashref->{ "col$_" };
> > $mode = $_;
> > }
> > }
> >
> > my $max = 1;
> > for ( reverse 1 .. 5 ) {
> > if ( $hashref->{ "col$_" } > 0 ) {
> > $max = $_;
> > last;
> > }
> > }
> >
>
> Have you looked at List::Util? It's a standard module in Perl 5.8, and
> contains a max function. So the first loop becomes something like
>
> use List::Util qw{max};
>
> my $mode = max (map {$hashref->{"col$_"}} 1..5);
>
> Assuming the only zeros are trailing, the second loop could be
>
> my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5

I appreciate the input, I don't think I was clear enough about what I
was looking for. The counts themselves don't really matter, what I was
looking for was the column numbers associated with 1) highest count
($mode); and 2) the highest column number with any count at all ($max).

John's solution worked well. I was only thinking of sort() because I
wanted largest of one thing with respect to another thing. I couldn't
think of a way to get max() to operate like that without going through
all the columns twice (once to find the max, and a second time to find
which column it was associated with).

Thanks,
-jp

Re: Accomodate for poor db design using Sort::Maker?

am 11.12.2006 18:43:35 von unknown

John W. Krahn wrote:
> harryfmudd [AT] comcast [DOT] net wrote:
>
>>Have you looked at List::Util? It's a standard module in Perl 5.8, and
>>contains a max function. So the first loop becomes something like
>>
>>use List::Util qw{max};
>>
>>my $mode = max (map {$hashref->{"col$_"}} 1..5);
>>
>>Assuming the only zeros are trailing, the second loop could be
>>
>>my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5
>>
>>since the value of a list in scalar context is the number of elements in
>>the list. If you have zeros embedded, this won't give the right answer,
>>since what it really does is find the number of values > 0, not the
>>highest index with a value > 0.
>>
>>Of course, for either of these you might want to make the hashref into a
>>real list, to avoid mapping twice.
>
>
> The OP's code produces:
>
> id mode max
> 1 2 3
> 2 1 1
> 3 3 5
> 4 3 4
> 5 2 3
> 6 1 3
> 7 4 4
> 8 2 2
> 9 1 5
> 10 2 3
> 11 1 1
> 12 1 5
>
> But your's produces:
>
> id mode max
> 1 8 3
> 2 5 1
> 3 8 5
> 4 8 4
> 5 7 3
> 6 9 3
> 7 7 4
> 8 3 2
> 9 9 5
> 10 4 2
> 11 0 0
> 12 1 5
>
> So neither $mode nor $max is correct for all lines.
>
>
>
>
> John

My mistake, corrected by DJ Stunks in adjacent post; I was returning the
mode, rather than the index of the mode. Etc.

Tom Wyant