Count differences between arrays

Count differences between arrays

am 07.01.2008 17:44:43 von Steve

Hi all,

I'm trying to count occurrences of elements in array1 that aren't in
array2. Currently I'm doing this by converting one array to a hash and
using 'exists':-

my %foo;
my $score = 0;
@foo{@array2} = (); # Convert array to hash (for exists)
for (@array1) { $score++ unless exists $foo{$_} };

Whilst this seems to work I'm sure there's a more efficient method.

Any suggestions?

Thanks

--
pub 1024D/228761E7 2003-06-04 Steven Crook
Key fingerprint = 1CD9 95E1 E9CE 80D6 C885 B7EB B471 80D5 2287 61E7
uid Steven Crook

Re: Count differences between arrays

am 07.01.2008 18:15:12 von Abigail

_
Steve (steve@mixmin.net) wrote on VCCXLII September MCMXCIII in
:
\\ Hi all,
\\
\\ I'm trying to count occurrences of elements in array1 that aren't in
\\ array2. Currently I'm doing this by converting one array to a hash and
\\ using 'exists':-
\\
\\ my %foo;
\\ my $score = 0;
\\ @foo{@array2} = (); # Convert array to hash (for exists)
\\ for (@array1) { $score++ unless exists $foo{$_} };
\\
\\ Whilst this seems to work I'm sure there's a more efficient method.
\\
\\ Any suggestions?


Read the FAQ?


Abigail
--
#!/opt/perl/bin/perl -w
$\ = $"; $SIG {TERM} = sub {print and exit};
kill 15 => fork for qw /Just another Perl Hacker/;

Re: Count differences between arrays

am 07.01.2008 19:19:10 von jurgenex

Steve wrote:
>I'm trying to count occurrences of elements in array1 that aren't in
>array2.

perldoc -q difference: "How do I compute the difference of two arrays?"

jue

Re: Count differences between arrays

am 07.01.2008 19:59:18 von it_says_BALLS_on_your forehead

On Jan 7, 11:44=A0am, Steve wrote:
> Hi all,
>
> I'm trying to count occurrences of elements in array1 that aren't in
> array2. =A0Currently I'm doing this by converting one array to a hash and
> using 'exists':-
>
> my %foo;
> my $score =3D 0;
> @foo{@array2} =3D (); # Convert array to hash (for exists)
> for (@array1) { $score++ unless exists $foo{$_} };
>
> Whilst this seems to work I'm sure there's a more efficient method.
>
> Any suggestions?
>

Which aspect of efficiency are you trying to improve?

Re: Count differences between arrays

am 07.01.2008 20:22:38 von Steve

On Mon, 7 Jan 2008 10:59:18 -0800 (PST), nolo contendere wrote in
Message-Id: :

> Which aspect of efficiency are you trying to improve?

The code is used in an update to Cleanfeed, the defacto filtering
software operated by Usenet server admins. Each NNTP message is
processed individually through the Cleanfeed filter so speed is really
the primary driver.

The actual function of this fragment of code is to compare the content
of the Newsgroups and Followup-To headers so that messages which
followup-to groups that aren't in the distribution are negatively
scored.

--
pub 1024D/228761E7 2003-06-04 Steven Crook
Key fingerprint = 1CD9 95E1 E9CE 80D6 C885 B7EB B471 80D5 2287 61E7
uid Steven Crook

Re: Count differences between arrays

am 07.01.2008 20:27:00 von it_says_BALLS_on_your forehead

On Jan 7, 2:22=A0pm, Steve wrote:
> On Mon, 7 Jan 2008 10:59:18 -0800 (PST), nolo contendere wrote in
> Message-Id: com>:
>
> > Which aspect of efficiency are you trying to improve?
>
> The code is used in an update to Cleanfeed, the defacto filtering
> software operated by Usenet server admins. =A0Each NNTP message is
> processed individually through the Cleanfeed filter so speed is really
> the primary driver.
>
> The actual function of this fragment of code is to compare the content
> of the Newsgroups and Followup-To headers so that messages which
> followup-to groups that aren't in the distribution are negatively
> scored.

Will the hash from array2 need to be constructed anew each time you
filter? Can you parallelize the work across all the NNTP messages, and
use a shared hash (or a reasonable facsimile) to perform the lookups?

Re: Count differences between arrays

am 08.01.2008 10:42:24 von Steve

On Mon, 7 Jan 2008 11:27:00 -0800 (PST), nolo contendere wrote in
Message-Id: :

> Will the hash from array2 need to be constructed anew each time you
> filter? Can you parallelize the work across all the NNTP messages, and
> use a shared hash (or a reasonable facsimile) to perform the lookups?

The hash will be constructed new for each message processed, as are the
arrays for Newsgroups and Followup-To: content. For both arrays there
are unlikely to be more than 10 elements.

--
pub 1024D/228761E7 2003-06-04 Steven Crook
Key fingerprint = 1CD9 95E1 E9CE 80D6 C885 B7EB B471 80D5 2287 61E7
uid Steven Crook

Re: Count differences between arrays

am 08.01.2008 16:20:28 von it_says_BALLS_on_your forehead

On Jan 8, 4:42=A0am, Steve wrote:
> On Mon, 7 Jan 2008 11:27:00 -0800 (PST), nolo contendere wrote in
> Message-Id: ..com>:
>
> > Will the hash from array2 need to be constructed anew each time you
> > filter? Can you parallelize the work across all the NNTP messages, and
> > use a shared hash (or a reasonable facsimile) to perform the lookups?
>
> The hash will be constructed new for each message processed, as are the
> arrays for Newsgroups and Followup-To: content. =A0For both arrays there
> are unlikely to be more than 10 elements.
>

What's the purpose of reconstructing the hash each time? Just do it
once in the beginning, if it won't change. Also, if it's static, you
can use it without reservation in parallel processing.