Filtering two files with uncommon column

Filtering two files with uncommon column

am 18.01.2008 10:22:10 von Madhur

I would like to know the best way of generating filter of two files
based upon the following condition

I have two files. Contents of the first file is

File 1
abc def hij
asd sss lmn
hig pqr mno


File 2

jih def asd
poi iuu wer
wer pqr jjj

I would like have the output as
Output

File1
asd sss lmn
File2
poi iuu wer

Basically I want to compare the two files based on second column. If
the second
column matches on both the files do not print anything, else if there
is no matc
h in for the second column for first file in second file then print it
under Fil
e1 header, else if there is no match for the second column for second
file in fi
rst file print it under File2 header.

Thankyou
Madhur

Re: Filtering two files with uncommon column

am 18.01.2008 13:09:29 von Icarus Sparry

On Fri, 18 Jan 2008 01:22:10 -0800, Madhur wrote:

> I would like to know the best way of generating filter of two files
> based upon the following condition
>
> I have two files. Contents of the first file is
>
> File 1
> abc def hij
> asd sss lmn
> hig pqr mno
>
>
> File 2
>
> jih def asd
> poi iuu wer
> wer pqr jjj
>
> I would like have the output as
> Output
>
> File1
> asd sss lmn
> File2
> poi iuu wer
>
> Basically I want to compare the two files based on second column. If the
> second
> column matches on both the files do not print anything, else if there is
> no matc
> h in for the second column for first file in second file then print it
> under Fil
> e1 header, else if there is no match for the second column for second
> file in fi
> rst file print it under File2 header.
>
> Thankyou
> Madhur

From your examples it is clear that the second column is not currently
sorted. Does the order of the lines in the output matter?

Are we supposed to be comparing on a line by line basis or can "def" be
in the second field on line 1 in one file and line 17 in the other?

If it is not on a line by line basis, is the second column unique (within
a file), or can you have repeated values? If you can have repeated
values, does a single line in the other file match them all, and hence
produce no output?

How big are the files compared to the memory of the machine - i.e. can we
read the files into memory and operate there or do we have to process
them as a stream?

Maybe

#!/bin/sh
echo "$1"
awk 'NR==FNR {a[$2]=1} NR!=FNR && !a[$2]' "$1" "$2"
echo "$2"
awk 'NR==FNR {a[$2]=1} NR!=FNR && !a[$2]' "$2" "$1"

is what you are looking for?

Re: Filtering two files with uncommon column

am 18.01.2008 13:16:47 von PK

Madhur wrote:

> I would like to know the best way of generating filter of two files
> based upon the following condition
>
> I have two files. Contents of the first file is
>
> File 1
> abc def hij
> asd sss lmn
> hig pqr mno
>
>
> File 2
>
> jih def asd
> poi iuu wer
> wer pqr jjj
>
> I would like have the output as
> Output
>
> File1
> asd sss lmn
> File2
> poi iuu wer
>
> Basically I want to compare the two files based on second column. If
> the second column matches on both the files do not print anything, else if
> there is no match in for the second column for first file in second file
> then print it under File1 header, else if there is no match for the second
> column for second file in first file print it under File2 header.

A possible solution:

paste f1 f2 | awk '$2 != $5 {print $1,$2,$3>"file1";print $4,$5,$6>"file2"}'

Re: Filtering two files with uncommon column

am 18.01.2008 15:42:29 von Ed Morton

On 1/18/2008 3:22 AM, Madhur wrote:
> I would like to know the best way of generating filter of two files
> based upon the following condition
>
> I have two files. Contents of the first file is
>
> File 1
> abc def hij
> asd sss lmn
> hig pqr mno
>
>
> File 2
>
> jih def asd
> poi iuu wer
> wer pqr jjj
>
> I would like have the output as
> Output
>
> File1
> asd sss lmn
> File2
> poi iuu wer
>
> Basically I want to compare the two files based on second column. If
> the second
> column matches on both the files do not print anything, else if there
> is no matc
> h in for the second column for first file in second file then print it
> under Fil
> e1 header, else if there is no match for the second column for second
> file in fi
> rst file print it under File2 header.
>
> Thankyou
> Madhur

Already answered in comp.lang.awk. Please don't multi-post.

Ed