Filtering two files with uncommon column
Filtering two files with uncommon column
am 18.01.2008 10:22:10 von Madhur
I would like to know the best way of generating filter of two files
based upon the following condition
I have two files. Contents of the first file is
File 1
abc def hij
asd sss lmn
hig pqr mno
File 2
jih def asd
poi iuu wer
wer pqr jjj
I would like have the output as
Output
File1
asd sss lmn
File2
poi iuu wer
Basically I want to compare the two files based on second column. If
the second
column matches on both the files do not print anything, else if there
is no matc
h in for the second column for first file in second file then print it
under Fil
e1 header, else if there is no match for the second column for second
file in fi
rst file print it under File2 header.
Thankyou
Madhur
Re: Filtering two files with uncommon column
am 18.01.2008 13:09:29 von Icarus Sparry
On Fri, 18 Jan 2008 01:22:10 -0800, Madhur wrote:
> I would like to know the best way of generating filter of two files
> based upon the following condition
>
> I have two files. Contents of the first file is
>
> File 1
> abc def hij
> asd sss lmn
> hig pqr mno
>
>
> File 2
>
> jih def asd
> poi iuu wer
> wer pqr jjj
>
> I would like have the output as
> Output
>
> File1
> asd sss lmn
> File2
> poi iuu wer
>
> Basically I want to compare the two files based on second column. If the
> second
> column matches on both the files do not print anything, else if there is
> no matc
> h in for the second column for first file in second file then print it
> under Fil
> e1 header, else if there is no match for the second column for second
> file in fi
> rst file print it under File2 header.
>
> Thankyou
> Madhur
From your examples it is clear that the second column is not currently
sorted. Does the order of the lines in the output matter?
Are we supposed to be comparing on a line by line basis or can "def" be
in the second field on line 1 in one file and line 17 in the other?
If it is not on a line by line basis, is the second column unique (within
a file), or can you have repeated values? If you can have repeated
values, does a single line in the other file match them all, and hence
produce no output?
How big are the files compared to the memory of the machine - i.e. can we
read the files into memory and operate there or do we have to process
them as a stream?
Maybe
#!/bin/sh
echo "$1"
awk 'NR==FNR {a[$2]=1} NR!=FNR && !a[$2]' "$1" "$2"
echo "$2"
awk 'NR==FNR {a[$2]=1} NR!=FNR && !a[$2]' "$2" "$1"
is what you are looking for?
Re: Filtering two files with uncommon column
am 18.01.2008 13:16:47 von PK
Madhur wrote:
> I would like to know the best way of generating filter of two files
> based upon the following condition
>
> I have two files. Contents of the first file is
>
> File 1
> abc def hij
> asd sss lmn
> hig pqr mno
>
>
> File 2
>
> jih def asd
> poi iuu wer
> wer pqr jjj
>
> I would like have the output as
> Output
>
> File1
> asd sss lmn
> File2
> poi iuu wer
>
> Basically I want to compare the two files based on second column. If
> the second column matches on both the files do not print anything, else if
> there is no match in for the second column for first file in second file
> then print it under File1 header, else if there is no match for the second
> column for second file in first file print it under File2 header.
A possible solution:
paste f1 f2 | awk '$2 != $5 {print $1,$2,$3>"file1";print $4,$5,$6>"file2"}'
Re: Filtering two files with uncommon column
am 18.01.2008 15:42:29 von Ed Morton
On 1/18/2008 3:22 AM, Madhur wrote:
> I would like to know the best way of generating filter of two files
> based upon the following condition
>
> I have two files. Contents of the first file is
>
> File 1
> abc def hij
> asd sss lmn
> hig pqr mno
>
>
> File 2
>
> jih def asd
> poi iuu wer
> wer pqr jjj
>
> I would like have the output as
> Output
>
> File1
> asd sss lmn
> File2
> poi iuu wer
>
> Basically I want to compare the two files based on second column. If
> the second
> column matches on both the files do not print anything, else if there
> is no matc
> h in for the second column for first file in second file then print it
> under Fil
> e1 header, else if there is no match for the second column for second
> file in fi
> rst file print it under File2 header.
>
> Thankyou
> Madhur
Already answered in comp.lang.awk. Please don't multi-post.
Ed