Parsing two files and comparing the first fields..

Parsing two files and comparing the first fields..

am 29.11.2007 00:12:36 von clearguy02

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john abc@gmail.com mstella
mstella mary bcd@yahoo.com bborders
msmith martin cde@gmail.com mstella
bborders bob dddd@gmail.com rcasey
swatson sush efgh@yahoo.com mstella
rcasey rick fff@gmail.com rcasey


c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Below is the script I am using and can you kindly let me know where I
am doing wrong here?

================================

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

@array1 = ;
@array2 = ;

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}
close (IN1);
close (IN2);
close (OUT1);
close (OUT2);
+++++++++++++++++++++++++++++++++++++


Thanks in advance,
JC

Re: Parsing two files and comparing the first fields..

am 29.11.2007 00:14:09 von clearguy02

On Nov 28, 3:12 pm, cleargu...@yahoo.com wrote:
> I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
> file has 4 fields and the second one has two fields, but both files
> have the "user_id" as the first field.
>
> Example:
>
> c:\test1.txt
> =================
> jcarter john a...@gmail.com mstella
> mstella mary b...@yahoo.com bborders
> msmith martin c...@gmail.com mstella
> bborders bob d...@gmail.com rcasey
> swatson sush e...@yahoo.com mstella
> rcasey rick f...@gmail.com rcasey
>
> c:\test2.txt
> ======================
> aaboss active
> jcarter active
> msmith non-active
> ssullivan non-active
> rcasey non-active
> usmiths active
>
> ===============================================
>
> Now I want to check if each id from the second file exists in the
> first one or not. I want the output of both matching and non-matching
> id's.
>
> Below is the script I am using and can you kindly let me know where I
> am doing wrong here?
>
> ================================
>
> use strict;
> use warnings;
>
> open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
> open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
> open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
> file: $!";
> open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
> file: $!";
>
> @array1 = ;
> @array2 = ;
>
> foreach $record1 (@array1)
> {
> chomp $record1;
> @fields1= split /\t/, $record1;
> $fist_id = $fields1[0];
> }
>
> foreach $record2 (@array2)
> {
> chomp $record2;
> @fields2= split /\t/, $record2;
> $second_id = $fields2[0];
>
> foreach (@fields1)
> {
> if ($second_id eq $fist_id)
> {
> print OUT1 "$record2\n" ; # matching
> }
> else
> {
> print OUT1 "$record2\n" ; # matching
> }
> }
> close (IN1);
> close (IN2);
> close (OUT1);
> close (OUT2);
> +++++++++++++++++++++++++++++++++++++
>
> Thanks in advance,
> JC

Forgot to add "my" before the variables while typing.. sorry about
that.

--JC

Re: Parsing two files and comparing the first fields..

am 29.11.2007 00:33:36 von 1usa

clearguy02@yahoo.com wrote in news:b20d8640-91c1-41d7-a46a-ab04bf405239
@d21g2000prf.googlegroups.com:

>
> Now I want to check if each id from the second file exists in the
> first one or not. I want the output of both matching and non-matching
> id's.

Read

perldoc -q intersection

Parse the files into a hashes using the id field values as keys.

> use strict;
> use warnings;
>
> open (IN1, "c:\test1.txt") || die "Can not open the file: $!";

This will probably not succeed as it will look for a file named
{TAB}est1.txt in c:\.

> open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
> open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
> file: $!";
> open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
> file: $!";

I generally prefer to use lexical filehandles and the three argument
form of open. Also, you can just use / as the directory separator in
Windows. For increased portability, I prefer to use File::Spec::catfile.

> @array1 = ;
> @array2 = ;

No need to slurp anything.

> foreach $record1 (@array1)
> {
> chomp $record1;
> @fields1= split /\t/, $record1;
> $fist_id = $fields1[0];

my $first_id = (split /\t/, $record)[0];

> }
>
> foreach $record2 (@array2)
> {
> chomp $record2;
> @fields2= split /\t/, $record2;
> $second_id = $fields2[0];


This nested loop approach will have extremely bad performance
characteristics as the number of input lines increases. Use hashes.

> foreach (@fields1)
> {
> if ($second_id eq $fist_id)
> {
> print OUT1 "$record2\n" ; # matching
> }
> else
> {
> print OUT1 "$record2\n" ; # matching
> }
> }

So if $second_id eq $first_id, your write it to OUT1, otherwise, you
also write it to OUT1. What's the point???

The script below represents my best guess as to what you are trying to
achieve.

#!/usr/bin/perl

use strict;
use warnings;

my %myconfig = (
input1 => 'input1.txt',
input2 => 'input2.txt',
matching => 'matching.txt',
non_matching => 'non_matching.txt',
);

my %fields1;

{
open my $input, '<', $myconfig{input1}
or die "Cannot open '$myconfig{input1}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
$fields1{ $1 } = 1;
}
}

close $input
or die "Cannot close '$myconfig{input1}': $!";
}

open my $input, '<', $myconfig{input2}
or die "Cannot open '$myconfig{input2}': $!";

open my $matching, '>', $myconfig{matching}
or die "Cannot open '$myconfig{matching}': $!";

open my $non_matching, '>', $myconfig{non_matching}
or die "Cannot open '$myconfig{non_matching}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
if ( exists $fields1{ $1 } ) {
print $matching "$1\n";
}
else {
print $non_matching "$1\n";
}
}
}

__END__

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input1.txt
jcarter john abc@gmail.com mstella
mstella mary bcd@yahoo.com bborders
msmith martin cde@gmail.com mstella
bborders bob dddd@gmail.com rcasey
swatson sush efgh@yahoo.com mstella
rcasey rick fff@gmail.com rcasey


C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input2.txt
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active


C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat matching.txt
jcarter
msmith
rcasey

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat non_matching.txt
aaboss
ssullivan
usmiths



--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
clpmisc guidelines:

Re: Parsing two files and comparing the first fields..

am 29.11.2007 00:54:27 von krahnj

clearguy02@yahoo.com wrote:
>
> I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
> file has 4 fields and the second one has two fields, but both files
> have the "user_id" as the first field.
>
> Example:
>
> c:\test1.txt
> =================
> jcarter john abc@gmail.com mstella
> mstella mary bcd@yahoo.com bborders
> msmith martin cde@gmail.com mstella
> bborders bob dddd@gmail.com rcasey
> swatson sush efgh@yahoo.com mstella
> rcasey rick fff@gmail.com rcasey
>
> c:\test2.txt
> ======================
> aaboss active
> jcarter active
> msmith non-active
> ssullivan non-active
> rcasey non-active
> usmiths active
>
> ===============================================
>
> Now I want to check if each id from the second file exists in the
> first one or not. I want the output of both matching and non-matching
> id's.


Something like this should work:


#!/usr/bin/perl
use warnings;
use strict;

open my $fh2, '<', 'c:/test2.txt' or die "Cannot open 'c:/test2.txt'
$!";

my %ids;
while ( <$fh2> ) {
$ids{ ( split /\t/ )[ 0 ] }++;
}

close $fh2;

open my $fh1, '<', 'c:/test1.txt' or die "Cannot open 'c:/test1.txt'
$!";
open my $match, '>', "$dir1/matching.txt" or die "Cannot open
'$dir1/matching.txt' $!";
open my $nonm, '>', "$dir1/not_matching.txt" or die "Cannot open
'$dir1/not_matching.txt' $!";

while ( <$fh1> ) {
my $id = ( split /\t/ )[ 0 ];
if ( exists $ids{ $id } ) {
print $match $_;
}
else {
print $nonm $_;
}
}

close $nonm;
close $match;
close $fh1;

__END__



John
--
use Perl;
program
fulfillment

Re: Parsing two files and comparing the first fields..

am 29.11.2007 02:08:52 von Jim Gibson

In article
<7a9957dd-9a6e-48a0-9a69-f9f2a27fa226@a35g2000prf.googlegroups.com>,
wrote:

> On Nov 28, 3:12 pm, cleargu...@yahoo.com wrote:
> > I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
> > file has 4 fields and the second one has two fields, but both files
> > have the "user_id" as the first field.

>
> Forgot to add "my" before the variables while typing.. sorry about
> that.

Please do not re-type your programs. Cut-and-paste from a program that
compiles. Have you read the guidelines for this group?

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com