compare 2 data files and extract fields for matched lines
compare 2 data files and extract fields for matched lines
am 27.12.2007 17:25:13 von Shree
Hello Friends,
The best way to describe what I'm trying to do is through an example.
I have 2 pipe delimited input files and want to extract a field from
file 2 and append it to file 1. Note I would like Output file to have
the same number of rows as Input File 1, with an additional field
whose value if present in file 2, should be inserted in this new
field. If its not present, then insert '0000'.
Input File 1 (zipcode, city, state, county)
36003|Autaugaville|AL|AUTAUGA
36006|Billingsley|AL|AUTAUGA
72314|Birdeye|AR|CROSS
72324|Cherry Valley|AR|CROSS
57437|Eureka|SD|MCPHERSON
67460|Mc Pherson|KS|MCPHERSON
67464|Marquette|KS|MCPHERSON
69167|Tryon|NE|MCPHERSON
...
...
Input File 2 (county, state, county population)
AUTAUGA|AL|49730
CROSS|AR|19056
MCPHERSON|KS|29380
...
Desired Output (zipcode, city, state, county, county population)
36003|Autaugaville|AL|AUTAUGA|49730
36006|Billingsley|AL|AUTAUGA|49730
72314|Birdeye|AR|CROSS|19056
72324|Cherry Valley|AR|CROSS|19056
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|29380
67464|Marquette|KS|MCPHERSON|29380
69167|Tryon|NE|MCPHERSON|0000
---
I wrote the program below but it has logic error. Instead of getting
the above, I get the following.
Any guidance with fixing the code or perhaps a better way to do this
is really appreciated. The above is just a few lines from my real
input files, which are considerably larger.
Thank you and best wishes,
Shree
36003|Autaugaville|AL|AUTAUGA|49730
36006|Billingsley|AL|AUTAUGA|49730
72314|Birdeye|AR|CROSS|0000
72324|Cherry Valley|AR|CROSS|0000
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|0000
67464|Marquette|KS|MCPHERSON|0000
69167|Tryon|NE|MCPHERSON|0000
36003|Autaugaville|AL|AUTAUGA|0000
36006|Billingsley|AL|AUTAUGA|0000
72314|Birdeye|AR|CROSS|19056
72324|Cherry Valley|AR|CROSS|19056
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|0000
67464|Marquette|KS|MCPHERSON|0000
69167|Tryon|NE|MCPHERSON|0000
36003|Autaugaville|AL|AUTAUGA|0000
36006|Billingsley|AL|AUTAUGA|0000
72314|Birdeye|AR|CROSS|0000
72324|Cherry Valley|AR|CROSS|0000
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|29380
67464|Marquette|KS|MCPHERSON|29380
69167|Tryon|NE|MCPHERSON|0000
------------------------------------------------------------ ---------
#!/usr/bin/perl
use strict;
my $File_In1 = "dat1.txt";
my $File_In2 = "dat2.txt";
my (@array1, @array2) = ();
my ($line1, $line2) = "";
my ($zip, $city, $state, $county) = "";
my ($county2, $state2, $pop) = "";
open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
@array1 = ;
close (FILE_IN1);
open (FILE_IN2, $File_In2) or die "cannot open file in FILE_IN2 $!";
@array2 = ;
close (FILE_IN2);
foreach $line2 (@array2) {
chomp ($line2);
($county2, $state2, $pop) = split (/\|/, $line2);
foreach $line1 (@array1) {
chomp ($line1);
($zip, $city, $state, $county) = split (/\|/, $line1);
if (($county2 eq $county) && ($state2 eq $state)) {
print "$zip|$city|$state|$county|$pop\n";
} else {
print "$zip|$city|$state|$county|0000\n";
}
}
}
Re: compare 2 data files and extract fields for matched lines
am 27.12.2007 17:50:33 von dn.perl
On Dec 27, shree wrote:
>
>
> open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
> @array1 = ;
> close (FILE_IN1);
>
Start with two sample files of 2-3 lines each.
Make sure it is okay to read the entire file
in an array by printing the value of @array1
and printing the value of $line1 after running
foreach $line1 (@array1); and read the files
line by line instead of reading them all at once.
Something like :
while ($line1 = ) { statements }.
As part of debugging the code, print out the value
of $line1 to see for yourself what strings or arrays
your code is dealing with.
Re: compare 2 data files and extract fields for matched lines
am 27.12.2007 18:03:36 von it_says_BALLS_on_your forehead
On Dec 27, 11:25=A0am, shree wrote:
> Hello Friends,
>
> The best way to describe what I'm trying to do is through an example.
> I have 2 pipe delimited input files and want to extract a field from
> file 2 and append it to file 1. =A0Note I would like Output file to have
> the same number of rows as Input File 1, with an additional field
> whose value if present in file 2, should be inserted in this new
> field. If its not present, then insert '0000'.
>
> Input File 1 (zipcode, city, state, county)
> 36003|Autaugaville|AL|AUTAUGA
> 36006|Billingsley|AL|AUTAUGA
> 72314|Birdeye|AR|CROSS
> 72324|Cherry Valley|AR|CROSS
> 57437|Eureka|SD|MCPHERSON
> 67460|Mc Pherson|KS|MCPHERSON
> 67464|Marquette|KS|MCPHERSON
> 69167|Tryon|NE|MCPHERSON
> ..
> ..
> Input File 2 (county, state, county population)
> AUTAUGA|AL|49730
> CROSS|AR|19056
> MCPHERSON|KS|29380
> ..
stick this info in a hash.
so, start your script:
#!/usr/bin/perl
use strict; use warnings;
my %pop;
open my $fh2, '<', $file2 or die "can't open '$file2': $!\n";
while ( <$fh2> ) {
chomp;
my ( $county, $state, $population ) =3D split /\|/;
$pop{"$state|$county"} =3D $population;
}
close $fh2;
# now you just need to open theother file, and do a lookup on the
"$state|$county" key
# if it's not there, append your 0000;
open my $fh1, '<', $file1 or die "can't open '$file1': $!\n";
while ( <$fh1> ) {
chomp;
my ( $zip, $city, $state, $county ) =3D split /\|/;
if ( $pop{"$state|$county"} ) {
print "$_|".$pop{"$state|$county"}."\n";
}
else {
print "$_|0000\n";
}
}
close $fh1;
__END__
* Note: untested
>
> Desired Output (zipcode, city, state, county, county population)
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
>
> ---
> I wrote the program below but it has logic error. Instead of getting
> the above, I get the following.
>
> Any guidance with fixing the code or perhaps a better way to do this
> is really appreciated. The above is just a few lines from my real
> input files, which are considerably larger.
>
> Thank you and best wishes,
> Shree
>
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
>
> ------------------------------------------------------------ ---------
> #!/usr/bin/perl
>
> use strict;
> my $File_In1 =3D "dat1.txt";
> my $File_In2 =3D "dat2.txt";
> my (@array1, @array2) =3D ();
> my ($line1, $line2) =3D "";
> my ($zip, $city, $state, $county) =3D "";
> my ($county2, $state2, $pop) =3D "";
>
> open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
> @array1 =3D ;
> close (FILE_IN1);
>
> open (FILE_IN2, $File_In2) or die "cannot open file in FILE_IN2 $!";
> @array2 =3D ;
> close (FILE_IN2);
>
> foreach $line2 (@array2) {
> =A0 =A0 =A0 =A0 chomp ($line2);
> =A0 =A0 =A0 =A0 ($county2, $state2, $pop) =3D split (/\|/, $line2);
> =A0 =A0 =A0 =A0 foreach $line1 (@array1) {
> =A0 =A0 =A0 =A0 chomp ($line1);
> =A0 =A0 =A0 =A0 ($zip, $city, $state, $county) =3D split (/\|/, $line1);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if =A0(($county2 eq $county) =A0&& ($state=
2 eq $state)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print "$zip|$city|$state|$=
county|$pop\n";
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print "$zip|$city|$state|$=
county|0000\n";
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 }
>
> }
>
>
Re: compare 2 data files and extract fields for matched lines
am 27.12.2007 18:29:40 von jurgenex
On shree wrote:
>The best way to describe what I'm trying to do is through an example.
That helps somewhat.
>I have 2 pipe delimited input files and want to extract a field from
>file 2 and append it to file 1. Note I would like Output file to have
>the same number of rows as Input File 1, with an additional field
>whose value if present in file 2, should be inserted in this new
>field. If its not present, then insert '0000'.
You forgot to mention and it is not clear from your example _which field_ is
the link between those 2 files.
>Input File 1 (zipcode, city, state, county)
>36003|Autaugaville|AL|AUTAUGA
>36006|Billingsley|AL|AUTAUGA
>..
>..
>Input File 2 (county, state, county population)
>AUTAUGA|AL|49730
>CROSS|AR|19056
>MCPHERSON|KS|29380
>..
>
>Desired Output (zipcode, city, state, county, county population)
>36003|Autaugaville|AL|AUTAUGA|49730
>36006|Billingsley|AL|AUTAUGA|49730
>72314|Birdeye|AR|CROSS|19056
>
>---
>I wrote the program below but it has logic error. Instead of getting
>the above, I get the following.
>
>Any guidance with fixing the code or perhaps a better way to do this
>is really appreciated. The above is just a few lines from my real
[attempt with 2 arrays and nested loops snipped]
There is a much easier approach:
- read file 2 into a hash, using the link between the 2 files as the key and
the desired number as the value in each hash entry.
- then read file 1 line by line and if the key exists then write the line
with the hash value to the new file, otherwise write the line with 0000
appended to the new file.
Not only is this much easier to comprehend, it is also much faster with
O(n+m) instead of O(n*m).
jue
Re: compare 2 data files and extract fields for matched lines
am 28.12.2007 06:43:00 von someone
shree wrote:
>
> The best way to describe what I'm trying to do is through an example.
> I have 2 pipe delimited input files and want to extract a field from
> file 2 and append it to file 1. Note I would like Output file to have
> the same number of rows as Input File 1, with an additional field
> whose value if present in file 2, should be inserted in this new
> field. If its not present, then insert '0000'.
>
> Input File 1 (zipcode, city, state, county)
> 36003|Autaugaville|AL|AUTAUGA
> 36006|Billingsley|AL|AUTAUGA
> 72314|Birdeye|AR|CROSS
> 72324|Cherry Valley|AR|CROSS
> 57437|Eureka|SD|MCPHERSON
> 67460|Mc Pherson|KS|MCPHERSON
> 67464|Marquette|KS|MCPHERSON
> 69167|Tryon|NE|MCPHERSON
> ..
> ..
> Input File 2 (county, state, county population)
> AUTAUGA|AL|49730
> CROSS|AR|19056
> MCPHERSON|KS|29380
> ..
>
> Desired Output (zipcode, city, state, county, county population)
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
>
> ---
> I wrote the program below but it has logic error. Instead of getting
> the above, I get the following.
>
> Any guidance with fixing the code or perhaps a better way to do this
> is really appreciated. The above is just a few lines from my real
> input files, which are considerably larger.
>
> Thank you and best wishes,
> Shree
>
>
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
>
> ------------------------------------------------------------ ---------
> #!/usr/bin/perl
>
> use strict;
> my $File_In1 = "dat1.txt";
> my $File_In2 = "dat2.txt";
> my (@array1, @array2) = ();
That is the same as:
my @array1 = ();
my @array2;
> my ($line1, $line2) = "";
That is the same as:
my $line1 = "";
my $line2;
> my ($zip, $city, $state, $county) = "";
That is the same as:
my $zip = "";
my ($city, $state, $county);
> my ($county2, $state2, $pop) = "";
That is the same as:
my $county2 = "";
my ($state2, $pop);
Anyway, you should declare your variables in the smallest possible scope
instead of all at the top of the file.
> open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
> @array1 = ;
> close (FILE_IN1);
>
> open (FILE_IN2, $File_In2) or die "cannot open file in FILE_IN2 $!";
> @array2 = ;
> close (FILE_IN2);
>
> foreach $line2 (@array2) {
> chomp ($line2);
> ($county2, $state2, $pop) = split (/\|/, $line2);
> foreach $line1 (@array1) {
> chomp ($line1);
> ($zip, $city, $state, $county) = split (/\|/, $line1);
> if (($county2 eq $county) && ($state2 eq $state)) {
> print "$zip|$city|$state|$county|$pop\n";
> } else {
> print "$zip|$city|$state|$county|0000\n";
> }
> }
> }
Try it like this:
#!/usr/bin/perl
use warnings;
use strict;
my $File_In1 = 'dat1.txt';
my $File_In2 = 'dat2.txt';
open FILE_IN2, '<', $File_In2 or die "cannot open '$File_In2' $!";
my %population;
while ( ) {
my ( $county, $state, $pop ) = /\A([^|]+)\|([^|]+)\|(\d+)\Z/;
$population{ "$state|$county" } = $pop;
}
close FILE_IN2;
open FILE_IN1, '<', $File_In1 or die "cannot open '$File_In1' $!";
while ( my $line = ) {
chomp $line;
my ( $key ) = $line =~ /\|([^|]+\|[^|]+)\z/;
print "$line|", $population{ $key } || '0000', "\n";
}
close FILE_IN1;
__END__
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Re: compare 2 data files and extract fields for matched lines
am 29.12.2007 22:45:53 von Shree
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $File_In1 = 'dat1.txt';
> my $File_In2 = 'dat2.txt';
>
> open FILE_IN2, '<', $File_In2 or die "cannot open '$File_In2' $!";
>
> my %population;
> while ( ) {
> my ( $county, $state, $pop ) = /\A([^|]+)\|([^|]+)\|(\d+)\Z/;
> $population{ "$state|$county" } = $pop;
> }
>
> close FILE_IN2;
>
> open FILE_IN1, '<', $File_In1 or die "cannot open '$File_In1' $!";
>
> while ( my $line = ) {
> chomp $line;
> my ( $key ) = $line =~ /\|([^|]+\|[^|]+)\z/;
> print "$line|", $population{ $key } || '0000', "\n";
> }
>
> close FILE_IN1;
>
> __END__
>
> John
Dear all,
Thanks for showing me how to do this. And an added thanks to John for
teaching good programming techniques in perl.
I was able to literally use the above and it worked like a charm.
Shree