Counting the repetition count of repeated substrings

Counting the repetition count of repeated substrings

am 15.11.2007 16:38:23 von Michele Dondi

This is the subject of the following PM node:

http://perlmonks.org/?node_id=650761

I am the author of the node and to briefly summarize, the problem is:
"to calculate the repetition count of repeated substrings of length 2
or more of a given string."

Currently, the most interesting code suggestions are within the
respective subs of the following benchmark:

(For definiteness the subs take the given string as an argument and
return a hashref of the counts.)


#!/usr/bin/perl

use strict;
use warnings;
use constant MIN => 2;
use Benchmark qw/:all :hireswallclock/;

my $str='aabcdabcabcecdecd';

sub oha {
my $s=shift;
my %saw;

while($s =~ /(..+)(?=.*?\1)/g) {
for my $x (0..length $1) {
@saw{ map {substr $1, $x, $_} $x+2..length $1 } = ();
}
}
$saw{$_} =()= $s =~ /\Q$_/g for keys %saw;
\%saw;
}

sub kramba {
my %count;
my $count;
$count = sub {
my( $string) = @_;
my $length = length( $string );

if ($length < MIN) {
$count{$_} == 1 and delete $count{$_} for keys %count;
return \%count;
}

for my $l (MIN..$length) {
my $s = substr( $string, 0, $l );
$count{ $s } += 1;
}
$count->( substr( $string, 1 ) );
};
$count->(shift);
for (keys %count) {
delete $count{$_} unless
$count{$_} >= 2 and length($_) >= MIN;
}
\%count;
}

sub ikegami {
my $str = shift;

local our %counts;
$str =~ /
(.{2,}) # or (.+)
(?(?{ !$counts{$1} })(?=.*\1))
(?{ ++$counts{$1} })
(?!)
/x;
\%counts;
}

sub lodin {
my $str = shift;

local our %count;
$str =~ /
(.{2,})
(?(?{ $count{$1} })
(?!)
)
.*
\1
(?{ ($count{$1} ||= 1)++ })
(?!)
/x;
\%count;
}

cmpthese 10_000 => {
oha => sub { oha $str },
kramba => sub { kramba $str },
ikegami => sub { ikegami $str },
lodin => sub { lodin $str },
};

__END__


Note: benchmarking is interesting but I'm also interested in other
charachteristics, e.g. readability, intuitiveness and so on.


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,