script takes long time to run when comparing digits within strings using foreach

script takes long time to run when comparing digits within strings using foreach

am 27.05.2011 10:18:01 von eventual

--0-1840572514-1306484281=:10519
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,
I have an array , @datas, and each element within @datas is a string that's=
made up of 6 digits with spaces in between like this â€=9C1 2 3 4 5 6=
â€=9D, so the array look like this=20
@datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5 9' ,=
'6 7 8 9 10 11');
Now I wish to compare each element of @datas with the rest of the elements =
in @datas in such a way that if 5 of the digits match, to take note of the =
matching indices,  and so the script I wrote is appended below.
However, the script below takes a long time to run if the datas at @datas a=
re huge( eg 30,000 elements). I then wonder is there a way to rewrite the s=
cript so that the script can run faster.
Thanks
 
###### script below #######################
 
#!/usr/bin/perl
use strict;
 
my @matched_location =3D ();
my @datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5 9=
' , '6 7 8 9 10 11');
 
my $iteration_counter =3D -1;
foreach (@datas){
   $iteration_counter++;
   my $reference =3D $_;
 
   my $second_iteration_counter =3D -1;
   my $string =3D '';
   foreach (@datas){
      $second_iteration_counter++;
      my @individual_digits =3D split / /,$_;
 
      my $ctr =3D 0;
      foreach(@individual_digits){
          if($reference =3D~/^=
$_ | $_ | $_$/){
             =C2=
=A0 $ctr++;
          }
      }
      if ($ctr >=3D 5){
          $string =3D $string =
.. "$second_iteration_counter ";
      }
   }
   $matched_location[$iteration_counter] =3D $string;
}
 
my $ctr =3D -1;
foreach(@matched_location){
    $ctr++;
    print "Index $ctr of \@matched_location =3D $_\n";
}
 
--0-1840572514-1306484281=:10519--

Re: script takes long time to run when comparing digits within strings using foreach

am 27.05.2011 11:38:39 von Shlomi Fish

Hi eventual,

On Friday 27 May 2011 11:18:01 eventual wrote:
> Hi,
> I have an array , @datas, and each element within @datas is a string that=
's
> made up of 6 digits with spaces in between like this â€=9C1 2 3 4 5 6=
â€=9D, so the
> array look like this @datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 =
4 5
> 8', '1 2 3 4 5 9' , '6 7 8 9 10 11'); Now I wish to compare each element
> of @datas with the rest of the elements in @datas in such a way that if 5
> of the digits match, to take note of the matching indices, and so the
> script I wrote is appended below. However, the script below takes a long
> time to run if the datas at @datas are huge( eg 30,000 elements). I then
> wonder is there a way to rewrite the script so that the script can run
> faster. Thanks
> =20
> ###### script below #######################
> =20
> #!/usr/bin/perl
> use strict;
> =20
> my @matched_location =3D ();
> my @datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5=
9'
> , '6 7 8 9 10 11');=20
> my $iteration_counter =3D -1;
> foreach (@datas){
> $iteration_counter++;
> my $reference =3D $_;
> =20
> my $second_iteration_counter =3D -1;
> my $string =3D '';
> foreach (@datas){
> $second_iteration_counter++;
> my @individual_digits =3D split / /,$_;
> =20
> my $ctr =3D 0;
> foreach(@individual_digits){
> if($reference =3D~/^$_ | $_ | $_$/){
> $ctr++;
> }
> }
> if ($ctr >=3D 5){
> $string =3D $string . "$second_iteration_counter ";
> }
> }
> $matched_location[$iteration_counter] =3D $string;
> }
> =20
> my $ctr =3D -1;
> foreach(@matched_location){
> $ctr++;
> print "Index $ctr of \@matched_location =3D $_\n";
> }
> =20

=46irst of all, you should add "use warnings;" to your code. Then you shoul=
d get=20
rid of the implicit $_ as loop iterator because it's easy to break. For mor=
e=20
information see:

http://perl-begin.org/tutorials/bad-elements/

Other than that - you should use a better algorithm. One option would be to=
=20
sort the integers and then use a diff/merge-like algorithm:

http://en.wikipedia.org/wiki/Merge_algorithm

A different way would be to use a hash to count the number of times each=20
number occured in the two sets, and then see how many of them got a value o=
f 2=20
(indicating they are in both sets).

But at the moment, everything is very inefficient there.

Regards,

Shlomi Fish


=2D-=20
=2D--------------------------------------------------------- -------
Shlomi Fish http://www.shlomifish.org/
"Star Trek: We, the Living Dead" - http://shlom.in/st-wtld

I often wonder why I hang out with so many people who are so pedantic. And
then I remember - because they are so pedantic.
=2D- Israeli Perl Monger

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: script takes long time to run when comparing digits within stringsusing foreach

am 27.05.2011 12:27:21 von rvtol+usenet

On 2011-05-27 10:18, eventual wrote:

> I have an array , @datas, and each element within @datas is a string th=
at's made up of 6 digits with spaces in between like this â€=9C1 2 3 =
4 5 6â€=9D, so the array look like this
> @datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5 =
9' , '6 7 8 9 10 11');
> Now I wish to compare each element of @datas with the rest of the eleme=
nts in @datas in such a way that if 5 of the digits match, to take note o=
f the matching indices, and so the script I wrote is appended below.

a. Do once what you can do only once. There are at least 2 points where=20
you didn't: 1. prepare @datas before looping; 2. don't compare the same=20
stuff more than once.

b. Assemble a result, and report at the end. Don't use any 'shared=20
resources' like incrementing global counters while going along.


#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @data =3D ;
$_ =3D { map { $_ =3D> 1 } split } for @data;
$ARGV[0] and print Dumper( \@data );

my @result;

for my $i ( 0 .. $#data - 1 ) {
my @k =3D keys %{ $data[ $i ] };
for my $j ( $i + 1 .. $#data ) {
my $n =3D 0;
exists $data[ $j ]{ $_ } and ++$n for @k;
$n >=3D 5 and push @result, [ $i, $j ];
}
}

print Dumper( \@result );

__DATA__
1 2 3 4 5 6
1 2 9 10 11 12
1 2 3 4 5 8
1 2 3 4 5 9
6 7 8 9 10 11


--=20
Ruud

--=20
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: script takes long time to run when comparing digits within stringsusing foreach

am 29.05.2011 04:17:46 von jwkrahn

eventual wrote:
> Hi,

Hello,

> I have an array , @datas, and each element within @datas is a string
> that's made up of 6 digits with spaces in between like this â€=9C1 =
2 3 4 5
> 6â€=9D, so the array look like this
> @datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5
> 9' , '6 7 8 9 10 11');
> Now I wish to compare each element of @datas with the rest of the
> elements in @datas in such a way that if 5 of the digits match, to
> take note of the matching indices, and so the script I wrote is
> appended below.
> However, the script below takes a long time to run if the datas at
> @datas are huge( eg 30,000 elements). I then wonder is there a way to
> rewrite the script so that the script can run faster.
> Thanks
>
> ###### script below #######################
>
> #!/usr/bin/perl
> use strict;
>
> my @matched_location =3D ();
> my @datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4=
5 9' , '6 7 8 9 10 11');
>
> my $iteration_counter =3D -1;
> foreach (@datas){
> $iteration_counter++;
> my $reference =3D $_;
>
> my $second_iteration_counter =3D -1;
> my $string =3D '';
> foreach (@datas){
> $second_iteration_counter++;
> my @individual_digits =3D split / /,$_;
>
> my $ctr =3D 0;
> foreach(@individual_digits){
> if($reference =3D~/^$_ | $_ | $_$/){
> $ctr++;
> }
> }
> if ($ctr>=3D 5){
> $string =3D $string . "$second_iteration_counter ";
> }
> }
> $matched_location[$iteration_counter] =3D $string;
> }
>
> my $ctr =3D -1;
> foreach(@matched_location){
> $ctr++;
> print "Index $ctr of \@matched_location =3D $_\n";
> }


Your program can be reduced to:

my @matched_location;
my @datas =3D ( '1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 =
5=20
9', '6 7 8 9 10 11' );


for my $i ( 0 .. $#datas ) {
for my $j ( 0 .. $#datas ) {
$matched_location[ $i ] .=3D "$j " if 5 <=3D grep $datas[ $i ] =3D=
~=20
/(?:^|(?<=3D ))$_(?=3D |$)/, split ' ', $datas[ $j ]
}
}

print map "Index $_ of \@matched_location =3D $matched_location[$_]\n", 0=
=20
... $#matched_location;

You should benchmark it to see if it is any faster than your original cod=
e.



John
--=20
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--=20
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/