removing rows based on two duplicate fileds

removing rows based on two duplicate fileds

am 13.09.2007 01:31:58 von heylow

Gurus:

I have merged /etc/passwd files many systems, and trying to master
passwd file. I concatenated and sorted uniquely. I have the resultant
file which looks like

smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh


I want to delete the duplicates; that is, I want to keep only one row
for every uid.

I have tried with ksh, but in vain. Can you shed how it can be done in
perl.

Thanks, Pedro

Re: removing rows based on two duplicate fileds

am 13.09.2007 01:46:43 von usenet

On Sep 12, 4:31 pm, heylow wrote:
> I want to delete the duplicates; that is, I want to keep only one row
> for every uid.

If you always want to assume the first instance wins, something like
this should work:

#!/usr/local/bin/perl
use strict; use warnings;

open (my $in, '<', 'passwd.merged');
open (my $out, '>', 'passwd.cleaned');

my %seen;
while (<$in>) {
/^(.*?):/;
print $out $_ unless $seen{$1};
$seen{$1}++;
}

__END__


--
The best way to get a good answer is to ask a good question.
David Filmer (http://DavidFilmer.com)

Re: removing rows based on two duplicate fileds

am 13.09.2007 01:59:41 von Tad McClellan

heylow wrote:


> I want to delete the duplicates;

> Can you shed how it can be done in
> perl.


It is done the way outlined in the Frequently Asked Questions.

perldoc -q duplicate

How can I remove duplicate elements from a list or array?


Modify it for your particular case:

---------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my @unique = ();
my %seen = ();

while ( ) {
my $elem = (split /:/)[2];
next if $seen{ $elem }++;
push @unique, $_;
}

print for @unique;

__DATA__
smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh
---------------------------------


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: removing rows based on two duplicate fileds

am 13.09.2007 03:13:50 von Tad McClellan

usenet@DavidFilmer.com wrote:

> open (my $in, '<', 'passwd.merged');


You should always, yes *always*, check the return value from open().


> /^(.*?):/;
> print $out $_ unless $seen{$1};


You should never use the dollar-digit variables unless you have
first ensured that the match _succeeded_.

The OP said he wanted duplicate uid's removed. The 1st field
is not the uid.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: removing rows based on two duplicate fileds

am 13.09.2007 09:41:32 von Martijn Lievaart

On Wed, 12 Sep 2007 23:31:58 +0000, heylow wrote:

> Gurus:
>
> I have merged /etc/passwd files many systems, and trying to master
> passwd file. I concatenated and sorted uniquely. I have the resultant
> file which looks like
>
> smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
> smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
> rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
> don:*:102:102:B25:/home/don:/bin/fakesh
> don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
> ele:*:255:255:A45:/home/ele:/bin/ksh
> rod:*:300:300:B456:/home/rod:/bin/ksh
>
>
> I want to delete the duplicates; that is, I want to keep only one row
> for every uid.
>
> I have tried with ksh, but in vain. Can you shed how it can be done in
> perl.

I would not use Perl for this but GNU sort:

# sort -t: -k3,3 -u passwd

M4

Re: removing rows based on two duplicate fileds

am 14.09.2007 08:16:50 von Joe Smith

Tad McClellan wrote:

> print for @unique;

Why that, instead of the simple

print @unique;

?
-Joe

Re: removing rows based on two duplicate fileds

am 14.09.2007 15:21:51 von Ben Morrow

Quoth Joe Smith :
> Tad McClellan wrote:
>
> > print for @unique;
>
> Why that, instead of the simple
>
> print @unique;

It's a habit one gets into when $\ is set to something useful (such as
when using -l).

Ben

Re: removing rows based on two duplicate fileds

am 15.09.2007 01:19:55 von Tad McClellan

Joe Smith wrote:
> Tad McClellan wrote:
>
>> print for @unique;
>
> Why that, instead of the simple
>
> print @unique;


No good reason in this particular case, since they all have newlines.

I originally had

print "$_\n" for @unique;

then removed print()'s argument when it came out double-spaced. :-)


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"