Efficiently de-duping an array

am 24.08.2007 01:49:18 von Dan Otterburn

I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array, keeping the item with the lowest index.

my @fruits = qw(
apple
apple
pear
banana
pear
apple
banana
plum
plum
apple
plum
peach
kiwi
pear
plum
banana
cherry
);

The "apple" I want is $fruits[0], the "pear" $fruits[2] etc...

My current solution is:

my @fruits_deduped;
while (my $fruit = pop @fruits) {
next if grep { $_ eq $fruit } @fruits;
push @fruits_deduped, $fruit;
}
@fruits = reverse @fruits_deduped;

This seems to be a lot of work, is there a better way to do this?

--
Dan Otterburn

Re: Efficiently de-duping an array

am 24.08.2007 02:42:03 von Gunnar Hjalmarsson

Dan Otterburn wrote:
> I have an array of a number of items, some of which are duplicates. I
> need to "de-dupe" the array, keeping the item with the lowest index.

> My current solution is:
>
> my @fruits_deduped;
> while (my $fruit = pop @fruits) {
> next if grep { $_ eq $fruit } @fruits;
> push @fruits_deduped, $fruit;
> }
> @fruits = reverse @fruits_deduped;
>
> This seems to be a lot of work, is there a better way to do this?

Use a hash.

my ( @fruits_deduped, %seen );
while ( my $fruit = shift @fruits ) {
push @fruits_deduped, $fruit unless $seen{$fruit}++;
}

See also "perldoc -q duplicate".

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Efficiently de-duping an array

am 24.08.2007 03:12:15 von Dan Otterburn

On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:

> Use a hash.
>
> my ( @fruits_deduped, %seen );
> while ( my $fruit = shift @fruits ) {
> push @fruits_deduped, $fruit unless $seen{$fruit}++;
> }

Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
each different $fruit - isn't auto-vivified until *after* "unless" has
tested? i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}

> See also "perldoc -q duplicate".

Apologies, I should have been able to find this without posting.

--
Dan Otterburn

Re: Efficiently de-duping an array

am 24.08.2007 03:31:42 von Tad McClellan

Dan Otterburn wrote:

> I have an array of a number of items, some of which are duplicates. I
> need to "de-dupe" the array,

Your Question is Asked Frequently:

perldoc -q duplicate

How can I remove duplicate elements from a list or array?

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Efficiently de-duping an array

am 24.08.2007 03:48:55 von Gunnar Hjalmarsson

Dan Otterburn wrote:
> On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:
>> Use a hash.
>>
>> my ( @fruits_deduped, %seen );
>> while ( my $fruit = shift @fruits ) {
>> push @fruits_deduped, $fruit unless $seen{$fruit}++;
>> }
>
> Many thanks. Just to clarify my understanding: this works because
> "unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
> each different $fruit - isn't auto-vivified until *after* "unless" has
> tested?

Well, it's rather about what $seen{$fruit}++ _returns_; please read
about auto-increment in "perldoc perlop".

> i.e. it is short-hand for:
>
> while ( my $fruit = shift @fruits ) {
> if ( !$seen{$fruit} ) {
> push @fruits_deduped, $fruit;
> $seen{$fruit} += 1;
> }
> }

Yes, almost. (Unlike my code, your code doesn't keep incrementing the
hash values.)

>> See also "perldoc -q duplicate".
>
> Apologies, I should have been able to find this without posting.

Yes. ;-) Apology accepted.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Efficiently de-duping an array

am 24.08.2007 04:45:52 von Tad McClellan

Dan Otterburn wrote:
> On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:
>
>> Use a hash.
>>
>> my ( @fruits_deduped, %seen );
>> while ( my $fruit = shift @fruits ) {
>> push @fruits_deduped, $fruit unless $seen{$fruit}++;
>> }
>
> Many thanks. Just to clarify my understanding: this works because
> "unless" binds tighter than "++"
^^^^^^^^^^^^^

"unless" is not an operator, so talking about its precedence makes
no sense.

> each different $fruit - isn't auto-vivified until *after* "unless" has
> tested?

That part is accurate though.

> i.e. it is short-hand for:
>
> while ( my $fruit = shift @fruits ) {
> if ( !$seen{$fruit} ) {
> push @fruits_deduped, $fruit;
> $seen{$fruit} += 1;
> }
> }
>
>> See also "perldoc -q duplicate".

....then do it with a grep():

my %seen;
@fruits = grep !$seen{$_}++, @fruits;

And it even reads kind of Englishy "grep not seen fruits" :-)

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Efficiently de-duping an array

am 24.08.2007 11:40:14 von Dan Otterburn

On 24 Aug, 02:31, Tad McClellan wrote:

> Your Question is Asked Frequently:

Thanks to both of you for being gentle and taking the time to answer
(and explain the answer to) a question that should never have been
asked.

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!

Re: Efficiently de-duping an array

am 24.08.2007 20:43:50 von rvtol+news

Dan Otterburn schreef:

> We learn by our mistakes - and I have made plenty here - so, if it is
> any consolation, I have learnt more than I would have done had I found
> the FAQ in the first place. I will endeavour not to make the same
> mistakes twice!

don't_be_too_embarassed() if $seen($mistake}++;

--
Affijn, Ruud

"Gewoon is een tijger."

Re: Efficiently de-duping an array

am 25.08.2007 08:18:23 von Tad McClellan

Dan Otterburn wrote:
> On 24 Aug, 02:31, Tad McClellan wrote:
>
>> Your Question is Asked Frequently:
>
> Thanks to both of you for being gentle and taking the time to answer
> (and explain the answer to) a question that should never have been
> asked.
>
> We learn by our mistakes - and I have made plenty here - so, if it is
> any consolation, I have learnt more than I would have done had I found
> the FAQ in the first place. I will endeavour not to make the same
> mistakes twice!

Making mistakes in a public forum is a very good way to "internalize"
a lesson. You're not likely to forget what has been learned.

I've "internalized" a bunch of stuff myself. :-)

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"