comparing two files

comparing two files

am 30.12.2007 02:55:07 von sonal10july

Guys,

I would like to describe the current scenario.
I got two type of files Primary and secondary. There is only one
primary file and around hundred secondary files.

Primary.txt Contains two columns i.e name and number of rows
===========================================================
Currency_exchange|25000
Sales|21000
instruments|120000

===========================================================


Secondary1.txt Contains two columns i.e name and number of rows

===========================================================
Currency_exchange|21000
Sales|21000
instruments|120000

===========================================================

Secondary2.txt Contains two columns i.e name and number of rows

===========================================================
Currency_exchange|23100
Sales|21000
instruments|120000

===========================================================

There are 100 more secondary files like
Secondary3.txt,Secondary4.txt.....Secondary100.txt.
First column( name) contains the same value among all files but second
column (number of rows) may contain different values.

Now, I want to compare each secondary file (i.e
Secondary1.txt,Secondary1.txt ....so on) with Primary.txt and copy
those rows in another file where number of rows are not matching.
In other words I want to figure out where the number of rows in
secondary files(i.e Secondary1.txt,Secondary1.txt ....so on) are not
matching with primary (primary.txt)

What is the best way to do this ? I will heartly thankful to all for
any assistance regarding this.

Thanks in advance

SS

Re: comparing two files

am 30.12.2007 03:26:48 von Icarus Sparry

On Sat, 29 Dec 2007 17:55:07 -0800, sonal10july wrote:

> Guys,

> I would like to describe the current scenario.
> I got two type of files Primary and secondary. There is only one primary
> file and around hundred secondary files.
>
> Primary.txt Contains two columns i.e name and number of rows
> ===========================================================
> Currency_exchange|25000
> Sales|21000
> instruments|120000
>
> ===========================================================
>
>
> Secondary1.txt Contains two columns i.e name and number of rows
>
> ===========================================================
> Currency_exchange|21000
> Sales|21000
> instruments|120000
>
> ===========================================================
>
> Secondary2.txt Contains two columns i.e name and number of rows
>
> ===========================================================
> Currency_exchange|23100
> Sales|21000
> instruments|120000
>
> ===========================================================


Good so far, you have show us some typical input files.

> There are 100 more secondary files like
> Secondary3.txt,Secondary4.txt.....Secondary100.txt. First column( name)
> contains the same value among all files but second column (number of
> rows) may contain different values.

Useful information - again helpful.

> Now, I want to compare each secondary file (i.e
> Secondary1.txt,Secondary1.txt ....so on) with Primary.txt and copy those
> rows in another file where number of rows are not matching. In other
> words I want to figure out where the number of rows in secondary
> files(i.e Secondary1.txt,Secondary1.txt ....so on) are not matching with
> primary (primary.txt)

At this point your request becomes less helpful. You didn;t show us the
required output. For instance you say "copy those rows in another file",
do you want a single "another file", or one file for each secondary. Do
you want some information on which secondary the mismatched row came from?


awk -F'|' 'NR==FNR {v[$1]=$2;}
v[$1]!=$2 {print FILENAME,$0}' primary.txt Secondary*.txt > out

may do what you want.

Re: comparing two files

am 30.12.2007 03:47:44 von Barry Margolin

In article
,
sonal10july@gmail.com wrote:

> Guys,
>
> I would like to describe the current scenario.
> I got two type of files Primary and secondary. There is only one
> primary file and around hundred secondary files.
>
> Primary.txt Contains two columns i.e name and number of rows
> ===========================================================
> Currency_exchange|25000
> Sales|21000
> instruments|120000
>
> ===========================================================
>
>
> Secondary1.txt Contains two columns i.e name and number of rows
>
> ===========================================================
> Currency_exchange|21000
> Sales|21000
> instruments|120000
>
> ===========================================================
>
> Secondary2.txt Contains two columns i.e name and number of rows
>
> ===========================================================
> Currency_exchange|23100
> Sales|21000
> instruments|120000
>
> ===========================================================
>
> There are 100 more secondary files like
> Secondary3.txt,Secondary4.txt.....Secondary100.txt.
> First column( name) contains the same value among all files but second
> column (number of rows) may contain different values.
>
> Now, I want to compare each secondary file (i.e
> Secondary1.txt,Secondary1.txt ....so on) with Primary.txt and copy
> those rows in another file where number of rows are not matching.
> In other words I want to figure out where the number of rows in
> secondary files(i.e Secondary1.txt,Secondary1.txt ....so on) are not
> matching with primary (primary.txt)
>
> What is the best way to do this ? I will heartly thankful to all for
> any assistance regarding this.
>
> Thanks in advance
>
> SS

This seems like a good starting point:

for file in Secodary*.txt
do
diff Primary.txt "$file"
done

Depending on your specific needs, you may want to use options to diff
and/or pipe the output to something to grab the parts you want.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

Re: comparing two files

am 30.12.2007 04:30:49 von sonal10july

Thanks for your quick reply .
There is a answer to your question :
"Do you want a single "another file", or one file for each secondary"

Yes. I want to create a single output file and also want to know which
secondary the mismatched row came from?


Following will be my output

Secondary1.txt|Currency_exchange|21000
Secondary2.txt|Currency_exchange|23100

Re: comparing two files

am 30.12.2007 08:07:51 von Icarus Sparry

On Sat, 29 Dec 2007 19:30:49 -0800, sonal10july wrote:

> Thanks for your quick reply .
> There is a answer to your question :
> "Do you want a single "another file", or one file for each secondary"
>
> Yes. I want to create a single output file and also want to know which
> secondary the mismatched row came from?
>
>
> Following will be my output
>
> Secondary1.txt|Currency_exchange|21000
> Secondary2.txt|Currency_exchange|23100

Did you try the two lines I suggested? It will do what you ask for except
there will be a space after the filename, rather than a "|".

awk -F'|' 'NR==FNR {v[$1]=$2;}
v[$1]!=$2 {print FILENAME "|" $0}' primary.txt Secondary*.txt > out

is a fix for this problem. Make sure you are using a 'sh' family shell
(sh, ksh, bash, zsh) when you type this, rather than a csh family (csh,
tcsh) or something even more exotic (rc, scsh, es, .....).

Re: comparing two files

am 30.12.2007 21:22:56 von sonal10july

I copied the above command in a file and ran the script but it's
showing all the records from all files. I'm not very much familier
with awk command .So, following are the steps I performed.

1. Copied the above command in a file called 'main_script.sh'

#########################################################
$cat main_script.sh
#!/bin/ksh
awk -F'|' 'NR==FNR {v[$1]=$2;}
v[$1]!=$2 {print FILENAME "|" $0}' Primary.txt Secondary*

#########################################################

2. Ran the script.

#########################################################
$ sh main_script.sh

Primary.txt|Currency_exchange|25000
Primary.txt|Sales|21000
Primary.txt|instruments|120000
Secondary1.txt|Currency_exchange|25000
Secondary1.txt|Sales|20000
Secondary1.txt|instruments|120000
Secondary2.txt|Currency_exchange|25000
Secondary2.txt|Sales|20000
Secondary2.txt|instruments|110000
Secondary3.txt|Currency_exchange|25000
Secondary3.txt|Sales|6600
Secondary3.txt|instruments|9000
#########################################################

Basically It printed the whole contents from all four files. Thanks in
advance for your help.

I'm using korn shell.
$ echo $SHELL
/bin/ksh


Best Regards
SS

Re: comparing two files

am 31.12.2007 00:07:12 von Icarus Sparry

On Sun, 30 Dec 2007 12:22:56 -0800, sonal10july wrote:

> I copied the above command in a file and ran the script but it's
> showing all the records from all files. I'm not very much familier with
> awk command .So, following are the steps I performed.
>
> 1. Copied the above command in a file called 'main_script.sh'
>
> ######################################################### $cat
> main_script.sh
> #!/bin/ksh
> awk -F'|' 'NR==FNR {v[$1]=$2;}
> v[$1]!=$2 {print FILENAME "|" $0}' Primary.txt Secondary*
>
> #########################################################
>
> 2. Ran the script.
>
> ######################################################### $ sh
> main_script.sh
>
> Primary.txt|Currency_exchange|25000
> Primary.txt|Sales|21000
> Primary.txt|instruments|120000
> Secondary1.txt|Currency_exchange|25000
> Secondary1.txt|Sales|20000
> Secondary1.txt|instruments|120000
> Secondary2.txt|Currency_exchange|25000
> Secondary2.txt|Sales|20000
> Secondary2.txt|instruments|110000
> Secondary3.txt|Currency_exchange|25000
> Secondary3.txt|Sales|6600
> Secondary3.txt|instruments|9000
> #########################################################
>
> Basically It printed the whole contents from all four files. Thanks in
> advance for your help.
>
> I'm using korn shell.
> $ echo $SHELL
> /bin/ksh
>
>
> Best Regards
> SS

OK, something is very wrong.

The -F'|' sets the field delimiter to be a vertical bar, which is the
correct value for the data you have shown us.

The "NR==FNR" is an awk idiom, which is true for the first file, and
false for the second and later files. So "NR==FNR { v[$1]=$2}" says "save
in the array 'v' the value of the second field in the element indexed by
the first field".

The second line "v[$1]!=$2" says "If the value stored in the 'v' array
for the first field is not the same as the second field, then do the
action", and the action is "{print FILENAME "|" $0}" which is "print out
the filename, a vertical bar, and the line from the file".

The second line, by definition, must be true for the first file, as the
first line sets the elements of the 'v' array.

When I copy your files I get the following output

Secondary1.txt|Currency_exchange|21000
Secondary2.txt|Currency_exchange|23100

Can you send me your files by email (the email address of this post is
valid)?

You might try changing the program to
#!/bin/ksh
awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"}
v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt
Secondary*

as a debugging aid, and letting us see the output (either here or via
email).
Icarus

Re: comparing two files

am 31.12.2007 03:01:24 von sonal10july

Here is the full details with the content of each file

lyca /home/sukumar/testing:cat Primary.txt
Currency_exchange|25000
Sales|21000
instruments|120000

lyca /home/sukumar/testing:cat Secondary1.txt
Currency_exchange|25000
Sales|20000
instruments|120000

lyca /home/sukumar/testing:cat Secondary2.txt
Currency_exchange|25000
Sales|21000
instruments|120000

lyca /home/sukumar/testing:cat Secondary3.txt
Currency_exchange|25000
Sales|6600
instruments|9000

lyca /home/sukumar/testing:cat main_script.sh
#!/bin/ksh
awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"}
v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt
Secondary*



lyca /home/sukumar/testing:ksh main_script.sh

Primary.txt|Currency_exchange|25000(Currency_exchange,25000)
Primary.txt|Sales|21000(Sales,21000)
Primary.txt|instruments|120000(instruments,120000)
Secondary1.txt|Currency_exchange|25000(Currency_exchange,250 00)
Secondary1.txt|Sales|20000(Sales,20000)
Secondary1.txt|instruments|120000(instruments,120000)
Secondary2.txt|Currency_exchange|25000(Currency_exchange,250 00)
Secondary2.txt|Sales|21000(Sales,21000)
Secondary2.txt|instruments|120000(instruments,120000)
Secondary3.txt|Currency_exchange|25000(Currency_exchange,250 00)
Secondary3.txt|Sales|6600(Sales,6600)
Secondary3.txt|instruments|9000(instruments,9000)


Can you please run the command for this input.
My output should be

############################################################ #

Secondary1.txt|Sales|20000(Sales,20000)
Secondary3.txt|Sales|6600(Sales,6600)
Secondary3.txt|instruments|9000(instruments,9000)

############################################################ #

Thsnks for your help.

Best Regards
SS

Re: comparing two files

am 31.12.2007 04:19:27 von Janis Papanagnou

sonal10july@gmail.com wrote:
> Here is the full details with the content of each file
>
> lyca /home/sukumar/testing:cat Primary.txt
> Currency_exchange|25000
> Sales|21000
> instruments|120000
>
> lyca /home/sukumar/testing:cat Secondary1.txt
> Currency_exchange|25000
> Sales|20000
> instruments|120000
>
> lyca /home/sukumar/testing:cat Secondary2.txt
> Currency_exchange|25000
> Sales|21000
> instruments|120000
>
> lyca /home/sukumar/testing:cat Secondary3.txt
> Currency_exchange|25000
> Sales|6600
> instruments|9000

Just a thought... Are there any invisible whitespace characters at
the end of the lines? Or control characters like CR-LF in one file
and just LF in the others?

Janis

>
> lyca /home/sukumar/testing:cat main_script.sh
> #!/bin/ksh
> awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"}
> v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt
> Secondary*
>
>
>
> lyca /home/sukumar/testing:ksh main_script.sh
>
> Primary.txt|Currency_exchange|25000(Currency_exchange,25000)
> Primary.txt|Sales|21000(Sales,21000)
> Primary.txt|instruments|120000(instruments,120000)
> Secondary1.txt|Currency_exchange|25000(Currency_exchange,250 00)
> Secondary1.txt|Sales|20000(Sales,20000)
> Secondary1.txt|instruments|120000(instruments,120000)
> Secondary2.txt|Currency_exchange|25000(Currency_exchange,250 00)
> Secondary2.txt|Sales|21000(Sales,21000)
> Secondary2.txt|instruments|120000(instruments,120000)
> Secondary3.txt|Currency_exchange|25000(Currency_exchange,250 00)
> Secondary3.txt|Sales|6600(Sales,6600)
> Secondary3.txt|instruments|9000(instruments,9000)
>
>
> Can you please run the command for this input.
> My output should be
>
> ############################################################ #
>
> Secondary1.txt|Sales|20000(Sales,20000)
> Secondary3.txt|Sales|6600(Sales,6600)
> Secondary3.txt|instruments|9000(instruments,9000)
>
> ############################################################ #
>
> Thsnks for your help.
>
> Best Regards
> SS

Re: comparing two files

am 31.12.2007 05:10:45 von sonal10july

On Dec 30, 9:01 pm, sonal10j...@gmail.com wrote:
> Here is the full details with the content of each file
>
> lyca /home/sukumar/testing:cat Primary.txt
> Currency_exchange|25000
> Sales|21000
> instruments|120000
>
> lyca /home/sukumar/testing:cat Secondary1.txt
> Currency_exchange|25000
> Sales|20000
> instruments|120000
>
> lyca /home/sukumar/testing:cat Secondary2.txt
> Currency_exchange|25000
> Sales|21000
> instruments|120000
>
> lyca /home/sukumar/testing:cat Secondary3.txt
> Currency_exchange|25000
> Sales|6600
> instruments|9000
>
> lyca /home/sukumar/testing:cat main_script.sh
> #!/bin/ksh
> awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"}
> v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt
> Secondary*
>
> lyca /home/sukumar/testing:ksh main_script.sh
>
> Primary.txt|Currency_exchange|25000(Currency_exchange,25000)
> Primary.txt|Sales|21000(Sales,21000)
> Primary.txt|instruments|120000(instruments,120000)
> Secondary1.txt|Currency_exchange|25000(Currency_exchange,250 00)
> Secondary1.txt|Sales|20000(Sales,20000)
> Secondary1.txt|instruments|120000(instruments,120000)
> Secondary2.txt|Currency_exchange|25000(Currency_exchange,250 00)
> Secondary2.txt|Sales|21000(Sales,21000)
> Secondary2.txt|instruments|120000(instruments,120000)
> Secondary3.txt|Currency_exchange|25000(Currency_exchange,250 00)
> Secondary3.txt|Sales|6600(Sales,6600)
> Secondary3.txt|instruments|9000(instruments,9000)
>
> Can you please run the command for this input.
> My output should be
>
>range ############################################################ #
>
> Secondary1.txt|Sales|20000(Sales,20000)
> Secondary3.txt|Sales|6600(Sales,6600)
> Secondary3.txt|instruments|9000(instruments,9000)
>
> ############################################################ #
>
> Thsnks for your help.
>
> Best Regards
> SS
As you advised I tried the same command with nawk and it's working
fine.
.. It's very very strange.
Secondly,I'm not sure how to see the version of the awk?

Re: comparing two files

am 31.12.2007 08:37:20 von Icarus Sparry

On Sun, 30 Dec 2007 20:10:45 -0800, sonal10july wrote:

> On Dec 30, 9:01 pm, sonal10j...@gmail.com wrote:
>> Here is the full details with the content of each file
>>
>> lyca /home/sukumar/testing:cat Primary.txt Currency_exchange|25000
>> Sales|21000
>> instruments|120000
>>
>> lyca /home/sukumar/testing:cat Secondary1.txt Currency_exchange|25000
>> Sales|20000
>> instruments|120000
>>
>> lyca /home/sukumar/testing:cat Secondary2.txt Currency_exchange|25000
>> Sales|21000
>> instruments|120000
>>
>> lyca /home/sukumar/testing:cat Secondary3.txt Currency_exchange|25000
>> Sales|6600
>> instruments|9000
>>
>> lyca /home/sukumar/testing:cat main_script.sh #!/bin/ksh
>> awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"}
>> v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt
>> Secondary*
>>
>> lyca /home/sukumar/testing:ksh main_script.sh
>>
>> Primary.txt|Currency_exchange|25000(Currency_exchange,25000)
>> Primary.txt|Sales|21000(Sales,21000)
>> Primary.txt|instruments|120000(instruments,120000)
>> Secondary1.txt|Currency_exchange|25000(Currency_exchange,250 00)
>> Secondary1.txt|Sales|20000(Sales,20000)
>> Secondary1.txt|instruments|120000(instruments,120000)
>> Secondary2.txt|Currency_exchange|25000(Currency_exchange,250 00)
>> Secondary2.txt|Sales|21000(Sales,21000)
>> Secondary2.txt|instruments|120000(instruments,120000)
>> Secondary3.txt|Currency_exchange|25000(Currency_exchange,250 00)
>> Secondary3.txt|Sales|6600(Sales,6600)
>> Secondary3.txt|instruments|9000(instruments,9000)
>>
>> Can you please run the command for this input. My output should be
>>
>>range ############################################################ #
>>
>> Secondary1.txt|Sales|20000(Sales,20000)
>> Secondary3.txt|Sales|6600(Sales,6600)
>> Secondary3.txt|instruments|9000(instruments,9000)
>>
>> ############################################################ #
>>
>> Thsnks for your help.
>>
>> Best Regards
>> SS
> As you advised I tried the same command with nawk and it's working fine.
> . It's very very strange.
> Secondly,I'm not sure how to see the version of the awk?

OK, the "/bin/awk" on Solaris is VERY VERY old. If you every want to do
anything with "awk" on solaris, always use either nawk, or else make sure
that you have /usr/xpg4/bin in your PATH before /bin (and make sure that
there is an awk in /usr/xpg4/bin).