removing rows based on two duplicate fileds
removing rows based on two duplicate fileds
am 13.09.2007 01:31:58 von heylow
Gurus:
I have merged /etc/passwd files many systems, and trying to master
passwd file. I concatenated and sorted uniquely. I have the resultant
file which looks like
smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh
I want to delete the duplicates; that is, I want to keep only one row
for every uid.
I have tried with ksh, but in vain. Can you shed how it can be done in
perl.
Thanks, Pedro
Re: removing rows based on two duplicate fileds
am 13.09.2007 01:46:43 von usenet
On Sep 12, 4:31 pm, heylow wrote:
> I want to delete the duplicates; that is, I want to keep only one row
> for every uid.
If you always want to assume the first instance wins, something like
this should work:
#!/usr/local/bin/perl
use strict; use warnings;
open (my $in, '<', 'passwd.merged');
open (my $out, '>', 'passwd.cleaned');
my %seen;
while (<$in>) {
/^(.*?):/;
print $out $_ unless $seen{$1};
$seen{$1}++;
}
__END__
--
The best way to get a good answer is to ask a good question.
David Filmer (http://DavidFilmer.com)
Re: removing rows based on two duplicate fileds
am 13.09.2007 01:59:41 von Tad McClellan
heylow wrote:
> I want to delete the duplicates;
> Can you shed how it can be done in
> perl.
It is done the way outlined in the Frequently Asked Questions.
perldoc -q duplicate
How can I remove duplicate elements from a list or array?
Modify it for your particular case:
---------------------------------
#!/usr/bin/perl
use warnings;
use strict;
my @unique = ();
my %seen = ();
while ( ) {
my $elem = (split /:/)[2];
next if $seen{ $elem }++;
push @unique, $_;
}
print for @unique;
__DATA__
smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh
---------------------------------
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Re: removing rows based on two duplicate fileds
am 13.09.2007 03:13:50 von Tad McClellan
usenet@DavidFilmer.com wrote:
> open (my $in, '<', 'passwd.merged');
You should always, yes *always*, check the return value from open().
> /^(.*?):/;
> print $out $_ unless $seen{$1};
You should never use the dollar-digit variables unless you have
first ensured that the match _succeeded_.
The OP said he wanted duplicate uid's removed. The 1st field
is not the uid.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Re: removing rows based on two duplicate fileds
am 13.09.2007 09:41:32 von Martijn Lievaart
On Wed, 12 Sep 2007 23:31:58 +0000, heylow wrote:
> Gurus:
>
> I have merged /etc/passwd files many systems, and trying to master
> passwd file. I concatenated and sorted uniquely. I have the resultant
> file which looks like
>
> smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
> smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
> rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
> don:*:102:102:B25:/home/don:/bin/fakesh
> don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
> ele:*:255:255:A45:/home/ele:/bin/ksh
> rod:*:300:300:B456:/home/rod:/bin/ksh
>
>
> I want to delete the duplicates; that is, I want to keep only one row
> for every uid.
>
> I have tried with ksh, but in vain. Can you shed how it can be done in
> perl.
I would not use Perl for this but GNU sort:
# sort -t: -k3,3 -u passwd
M4
Re: removing rows based on two duplicate fileds
am 14.09.2007 08:16:50 von Joe Smith
Tad McClellan wrote:
> print for @unique;
Why that, instead of the simple
print @unique;
?
-Joe
Re: removing rows based on two duplicate fileds
am 14.09.2007 15:21:51 von Ben Morrow
Quoth Joe Smith :
> Tad McClellan wrote:
>
> > print for @unique;
>
> Why that, instead of the simple
>
> print @unique;
It's a habit one gets into when $\ is set to something useful (such as
when using -l).
Ben
Re: removing rows based on two duplicate fileds
am 15.09.2007 01:19:55 von Tad McClellan
Joe Smith wrote:
> Tad McClellan wrote:
>
>> print for @unique;
>
> Why that, instead of the simple
>
> print @unique;
No good reason in this particular case, since they all have newlines.
I originally had
print "$_\n" for @unique;
then removed print()'s argument when it came out double-spaced. :-)
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"