Bash script to subtotal a column in a file

Bash script to subtotal a column in a file

am 23.01.2008 15:11:12 von bigchris

Hi there, I have a file that has two columns.
Column A is text
Column B is a number
Both separated by a comma

i.e.

bang, 5
bang, 2
doh, 5
eeek, 7
eeek, 2
fluff, 5

I want to roll down the file and add the second column up all the time
the first is the same, so the result of above would be

bang, 7
doh, 5
eeek, 9
fluff, 5

I wanted to keep this simple with bash script rather than perl but am
having trouble with the loops.

Does anyone have a suggestion on how to approach this?

Re: Bash script to subtotal a column in a file

am 23.01.2008 15:23:19 von Ed Morton

On 1/23/2008 8:11 AM, bigchris wrote:
> Hi there, I have a file that has two columns.
> Column A is text
> Column B is a number
> Both separated by a comma
>
> i.e.
>
> bang, 5
> bang, 2
> doh, 5
> eeek, 7
> eeek, 2
> fluff, 5
>
> I want to roll down the file and add the second column up all the time
> the first is the same, so the result of above would be
>
> bang, 7
> doh, 5
> eeek, 9
> fluff, 5
>
> I wanted to keep this simple with bash script rather than perl but am
> having trouble with the loops.

A loop is usually the wrong approach in a shell script.

> Does anyone have a suggestion on how to approach this?

If output order doesn't matter:

awk -F, '{a[$1] += $2} END{for (i in a) print i", "a[i]}' file

Ed.

Re: Bash script to subtotal a column in a file

am 23.01.2008 15:36:59 von mallin.shetland

bigchris scrisse:

> ...
> I wanted to keep this simple with bash script rather than perl but...

I think an awk script is a good solution.
This had be clear: a bash script this job I prefer doing this job by a Perl
ors a second choice, AWK script, but if you don't have Perl
intlled because a babu if you


> ... am having trouble with the loops.

You have prepocess data with sort to reduce dramatically
complexity of loop.

sort $infile | awk -e 'three line of awk' > $outfile

Re: Bash script to subtotal a column in a file

am 23.01.2008 15:44:36 von mallin.shetland

bigchris scrisse:

> ...
> I wanted to keep this simple with bash script rather than perl but...

I think an awk script is a good solution.
This had be clear: a bash script this job I prefer doing this job by a
Perl or a second choice, AWK script, but if you don't have Perl
installed on your system...


> ... am having trouble with the loops.

You have to prepocess data with sort to reduce dramatically
complexity of loop.

sort $infile | awk -e 'three line of awk' > $outfile


PS sorry, I posted a draft with some error

Re: Bash script to subtotal a column in a file

am 23.01.2008 15:49:43 von bigchris

On Jan 23, 9:23=A0am, Ed Morton wrote:

Thanks Ed, that worked a TREAT and was very fast.

> If output order doesn't matter:
>
> awk -F, '{a[$1] +=3D $2} END{for (i in a) print i", "a[i]}' file
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -

Since that was so simple it begs another question.

I have two files (different two same type of data), and I want to
compare each line in the first file for it's existance in the second.

The first file only has one text variable on it per line (I already
cut it down to the minimum data)
The second file has 4 fields seperated by commas, but I'm really only
comparing part of the first variable, hence grep.

I currently have a script that works nicely but takes hours to run. I
essentially read a line at a time from the first file and grep the
second file, if I get a hit then I print that out to a third file,
then go back and read the second line in the first file.

This is SLOW but the only way I could think to do it.

Any further ideas?

Re: Bash script to subtotal a column in a file

am 23.01.2008 15:53:11 von Luuk

"Ed Morton" schreef in bericht
news:47974DD7.8040505@lsupcaemnt.com...
>
>
> On 1/23/2008 8:11 AM, bigchris wrote:
>> Hi there, I have a file that has two columns.
>> Column A is text
>> Column B is a number
>> Both separated by a comma
>>
>> i.e.
>>
>> bang, 5
>> bang, 2
>> doh, 5
>> eeek, 7
>> eeek, 2
>> fluff, 5
>>
>> I want to roll down the file and add the second column up all the time
>> the first is the same, so the result of above would be
>>
>> bang, 7
>> doh, 5
>> eeek, 9
>> fluff, 5
>>
>> I wanted to keep this simple with bash script rather than perl but am
>> having trouble with the loops.
>
> A loop is usually the wrong approach in a shell script.
>
>> Does anyone have a suggestion on how to approach this?
>
> If output order doesn't matter:
>
> awk -F, '{a[$1] += $2} END{for (i in a) print i", "a[i]}' file
>
> Ed.
>

if order does matter:

awk -F, '{ if (b!="" && b!=$1) { print b,a; a=0; } a=a+$2; b=$1; }' file

Re: Bash script to subtotal a column in a file

am 23.01.2008 15:59:28 von Ed Morton

On 1/23/2008 8:49 AM, bigchris wrote:
> On Jan 23, 9:23 am, Ed Morton wrote:
>
> Thanks Ed, that worked a TREAT and was very fast.
>
>
>>If output order doesn't matter:
>>
>>awk -F, '{a[$1] += $2} END{for (i in a) print i", "a[i]}' file
>>
>> Ed.- Hide quoted text -
>
>
> Since that was so simple it begs another question.
>
> I have two files (different two same type of data), and I want to
> compare each line in the first file for it's existance in the second.
>
> The first file only has one text variable on it per line (I already
> cut it down to the minimum data)

There was probably no need to do that since you can just select some field(s) to
be the keys.

> The second file has 4 fields seperated by commas, but I'm really only
> comparing part of the first variable, hence grep.

"part of the first variable"?

> I currently have a script that works nicely but takes hours to run. I
> essentially read a line at a time from the first file and grep the
> second file, if I get a hit then I print that out to a third file,
> then go back and read the second line in the first file.
>
> This is SLOW but the only way I could think to do it.
>
> Any further ideas?

If I understand what you mean by "part of the first variable", then something
like this:

awk 'NR==FNR{a[$1];next}{for (i in a) if ($1 ~ i) print}' file1 file2

should do it, but it may be inefficient if the first file is large and there MAY
be a more efficient way if you can explain more about the ways in which the
fields from the first file can be contained in the first field of the second
file and provide some small set of sample input and expected output.

Ed.

Re: Bash script to subtotal a column in a file

am 23.01.2008 16:16:45 von bigchris

>
> should do it, but it may be inefficient if the first file is large and there MAY
> be a more efficient way if you can explain more about the ways in which the
> fields from the first file can be contained in the first field of the second
> file and provide some small set of sample input and expected output.
>

Thanks Ed. Yes, the first file only contains the left hand side of an
email address i.e. chris

Whereas the second file contains the full email address in the first
field i.e.
chris@domain.com, blah, blah, count, blah

So I want to take the "chris" in the first file and compare it to each
line of the second file where there would be a match if chris in file
1 = chris@domain.com in the second file and if so total column 4 for
each instance it showed up in the second file.

Fun eh?

Re: Bash script to subtotal a column in a file

am 23.01.2008 16:35:23 von Ed Morton

On 1/23/2008 9:16 AM, bigchris wrote:
>>should do it, but it may be inefficient if the first file is large and there MAY
>>be a more efficient way if you can explain more about the ways in which the
>>fields from the first file can be contained in the first field of the second
>>file and provide some small set of sample input and expected output.
>>
>
>
> Thanks Ed. Yes, the first file only contains the left hand side of an
> email address i.e. chris
>
> Whereas the second file contains the full email address in the first
> field i.e.
> chris@domain.com, blah, blah, count, blah
>
> So I want to take the "chris" in the first file and compare it to each
> line of the second file where there would be a match if chris in file
> 1 = chris@domain.com in the second file and if so total column 4 for
> each instance it showed up in the second file.
>
> Fun eh?

Actyually, it's pretty straight forward. If the "@" can't appear elsewhere then
all you need is:

awk -F'[@,]' 'NR==FNR{a[$1];next} $1 in a{c[$1"@"$2]+=$5} END{for (i in c) print
i,c[i]}' file1 file2

Note that because we're now using either "@" or "," as field separators and so
splitting the "1st field" into 2, the "4th field" is now the 5th field.

Regards,

Ed.

Re: Bash script to subtotal a column in a file

am 23.01.2008 20:15:01 von bigchris

Ed, thanks a million, that did it I believe. I have to go through and
double check the data to ensure no anomolies, but from first glance it
worked.

Now it runs in about 2 seconds verses my script which ran in 6 hours!!

> Actyually, it's pretty straight forward. If the "@" can't appear elsewhere then
> all you need is:
>
> awk -F'[@,]' 'NR==FNR{a[$1];next} $1 in a{c[$1"@"$2]+=$5} END{for (i in c) print
> i,c[i]}' file1 file2

Re: Bash script to subtotal a column in a file

am 25.01.2008 00:51:48 von bigchris

> awk -F'[@,]' 'NR==FNR{a[$1];next} $1 in a{c[$1"@"$2]+=$5} END{for (i in c) print
> i,c[i]}' file1 file2

Ed, I've been doing more work with your awk line and noticed that the
results file doesn't contain all instances of matching records between
the first and second file.

So to explain again what I was hoping would happen. We're essentially
comparing file1 and file2 where I want to take the first line of file1
(which one entry - an email address) and match it to the contents of
the second file. Whereever that entry exists in the second file I want
to add up all the contents of column 4 and present one line in the
results file with the total.

At first look I thought this was working perfectly but looking more
closely I see some entries that are in both files but don't end up in
the results file.

Do you have any thoughts as to why this might happen?

Re: Bash script to subtotal a column in a file

am 25.01.2008 00:55:05 von Ed Morton

On 1/24/2008 5:51 PM, bigchris wrote:
>>awk -F'[@,]' 'NR==FNR{a[$1];next} $1 in a{c[$1"@"$2]+=$5} END{for (i in c) print
>>i,c[i]}' file1 file2
>
>
> Ed, I've been doing more work with your awk line and noticed that the
> results file doesn't contain all instances of matching records between
> the first and second file.
>
> So to explain again what I was hoping would happen. We're essentially
> comparing file1 and file2 where I want to take the first line of file1
> (which one entry - an email address) and match it to the contents of
> the second file. Whereever that entry exists in the second file I want
> to add up all the contents of column 4 and present one line in the
> results file with the total.
>
> At first look I thought this was working perfectly but looking more
> closely I see some entries that are in both files but don't end up in
> the results file.
>
> Do you have any thoughts as to why this might happen?

Maybe there's white space characters around the email address in one or the
other files? Can you post some sample output and the undesirable output?

Ed.

Re: Bash script to subtotal a column in a file

am 25.01.2008 17:24:34 von bigchris

Thanks Ed. I looked again today and it might have been a "tired eye"
problem.. seeing things that weren't there.