sort 3rd col in a file and print
sort 3rd col in a file and print
am 16.01.2008 19:38:22 von bhaveshah
Hi Gurus,
I've a 10,000 line file with following output. I need to sort the
third col from highest no to lowest no. and then print the top 10
higest no lines like head -10 filename. please help.
The 3rd col has many nos that are repeated like there will be may
lines with no 1 or 10 etc..
user1
INBOX 17033
user1 "Junk E-
mail" 4707
user2 "Legato
Support" 34
user3
NetApp 8
user3 Networker
Re: sort 3rd col in a file and print
am 16.01.2008 19:51:42 von Janis Papanagnou
explor wrote:
> Hi Gurus,
> I've a 10,000 line file with following output. I need to sort the
> third col from highest no to lowest no.
sort +2nr infile | head -10
Janis
> and then print the top 10
> higest no lines like head -10 filename. please help.
>
> The 3rd col has many nos that are repeated like there will be may
> lines with no 1 or 10 etc..
>
>
> user1
> INBOX 17033
> user1 "Junk E-
> mail" 4707
> user2 "Legato
> Support" 34
> user3
> NetApp 8
> user3 Networker
Re: sort 3rd col in a file and print
am 16.01.2008 20:00:09 von bhaveshah
On Jan 16, 10:51=A0am, Janis Papanagnou
wrote:
> explor wrote:
> > Hi Gurus,
> > I've a 10,000 line file with following output. I need to sort the
> > third col from highest no to lowest no.
>
> =A0 =A0sort +2nr infile | head -10
>
> Janis
Thanks Janis..I tried but isn't working. I get first few blank lines
and then also its not in correct order. Can i send the file to you?
Please do let me know. Thanks again
Re: sort 3rd col in a file and print
am 16.01.2008 20:48:21 von Icarus Sparry
On Wed, 16 Jan 2008 11:00:09 -0800, explor wrote:
> On Jan 16, 10:51Â am, Janis Papanagnou
> wrote:
>> explor wrote:
>> > Hi Gurus,
>> > I've a 10,000 line file with following output. I need to sort the
>> > third col from highest no to lowest no.
>>
>> Â Â sort +2nr infile | head -10
>>
>> Janis
>
> Thanks Janis..I tried but isn't working. I get first few blank lines and
> then also its not in correct order. Can i send the file to you? Please
> do let me know. Thanks again
Since your example shows a line like this
user1 "Junk E-mail" 4707
then your idea of the "third column" and sort's idea will differ. In
particular sort does not know about quoting, so its idea of the third
field will be the 7 characters E-mail" (including the double quote), as
the fields are whitespace delimited.
So we need to know more about your file. For instance does it use tab
characters? Do the fields as you see them (rather than as sort sees them)
line up in fixed positions?
Sort will almost certainly be able to do what you want, but it will need
a little coaxing. If you can change the input format a little, e.g. put
tab characters between columns, then the -t option to sort will probably
be all you need.
Re: sort 3rd col in a file and print
am 16.01.2008 20:57:56 von Ed Morton
On 1/16/2008 12:38 PM, explor wrote:
> Hi Gurus,
> I've a 10,000 line file with following output. I need to sort the
> third col from highest no to lowest no. and then print the top 10
> higest no lines like head -10 filename. please help.
>
> The 3rd col has many nos that are repeated like there will be may
> lines with no 1 or 10 etc..
>
>
> user1
> INBOX 17033
> user1 "Junk E-
> mail" 4707
> user2 "Legato
> Support" 34
> user3
> NetApp 8
> user3 Networker
Assuming there's newsreader line-wrapping going on above and your input's really
on individual lines, try this:
awk -v OFS="\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
Regards,
Ed.
Re: sort 3rd col in a file and print
am 16.01.2008 20:59:59 von Janis Papanagnou
explor wrote:
> On Jan 16, 10:51 am, Janis Papanagnou
> wrote:
>
>>explor wrote:
>>
>>>Hi Gurus,
>>>I've a 10,000 line file with following output. I need to sort the
>>>third col from highest no to lowest no.
>>
>> sort +2nr infile | head -10
>>
>>Janis
Here is sample data with *three* columns and the respective output...
$ cat infile
oNE Two 12
OnE tWo 7
oNe twO 1
onE TWo 14
One tWO 5
$ sort +2nr infile
onE TWo 14
oNE Two 12
OnE tWo 7
One tWO 5
oNe twO 1
>
>
> Thanks Janis..I tried but isn't working. I get first few blank lines
> and then also its not in correct order. Can i send the file to you?
> Please do let me know. Thanks again
Don't expect that people decipher ill-formatted sample data. Choose
a sample that significantly describes your task and is not wrapped
around lines so that the boundaries can't be easily seen. Next time.
After your "isn't working" report which is quite meaningless per se
I had a peek into your data and it seems you have constructs like
field1 "this is *not* field 2" 42
The above record contains 7 white space delimited fields but you want
it sorted by the *last* column and not (as you wrote) by the "third"
column.
One possibility to sort this is...
awk '{printf("%10d %s\n",$NF,$0)}' infile | sort -nr | cut -c12-
Janis
Re: sort 3rd col in a file and print
am 17.01.2008 02:05:15 von bhaveshah
On Jan 16, 11:57=A0am, Ed Morton wrote:
> On 1/16/2008 12:38 PM, explor wrote:
>
>
>
>
>
> > Hi Gurus,
> > I've a 10,000 line file with following output. I need to sort the
> > third col from highest no to lowest no. and then print the top 10
> > higest no lines like head -10 filename. please help.
>
> > The 3rd col has many nos that are repeated like there will be may
> > lines with no 1 or 10 etc..
>
> > user1
> > INBOX =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A017033
> > user1 =A0 =A0 =A0 =A0 =A0 "Junk E-
> > mail" =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 4707
> > user2 =A0 =A0 =A0 =A0 =A0 "Legato
> > Support" =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A034
> > user3
> > NetApp =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 8
> > user3 =A0 =A0 =A0 =A0 =A0 Networker
>
> Assuming there's newsreader line-wrapping going on above and your input's =
really
> on individual lines, try this:
>
> awk -v OFS=3D"\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
>
> Regards,
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -
Thanks ED.. That worked great..Thanks a lot.
Re: sort 3rd col in a file and print
am 17.01.2008 02:06:45 von bhaveshah
On Jan 16, 11:57=A0am, Ed Morton wrote:
> On 1/16/2008 12:38 PM, explor wrote:
>
>
>
>
>
> > Hi Gurus,
> > I've a 10,000 line file with following output. I need to sort the
> > third col from highest no to lowest no. and then print the top 10
> > higest no lines like head -10 filename. please help.
>
> > The 3rd col has many nos that are repeated like there will be may
> > lines with no 1 or 10 etc..
>
> > user1
> > INBOX =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A017033
> > user1 =A0 =A0 =A0 =A0 =A0 "Junk E-
> > mail" =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 4707
> > user2 =A0 =A0 =A0 =A0 =A0 "Legato
> > Support" =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A034
> > user3
> > NetApp =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 8
> > user3 =A0 =A0 =A0 =A0 =A0 Networker
>
> Assuming there's newsreader line-wrapping going on above and your input's =
really
> on individual lines, try this:
>
> awk -v OFS=3D"\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
>
> Regards,
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -
That worked great..Thanks a lot Ed and Janis
Re: sort 3rd col in a file and print
am 17.01.2008 10:49:29 von gazelle
In article ,
explor wrote:
....
>>
>> awk -v OFS="\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
>>
>> Regards,
>>
>> Ed.- Hide quoted text -
>>
>> - Show quoted text -
>
>That worked great..Thanks a lot Ed and Janis
Of course, if you wanted to, you could put a few (more) greps and cats
into that command line as well. (This is my way of saying: Way too many
commands!)
Since you can, of course, do it all with one simple (g)AWK command, and
not have to use all those other commands.
(Details left as an exercise...)
Re: sort 3rd col in a file and print
am 17.01.2008 14:30:31 von Janis Papanagnou
On 17 Jan., 10:49, gaze...@xmission.xmission.com (Kenny McCormack)
wrote:
> In article
com>,explor =A0 wrote:
>
> ...
>
>
>
> >> awk -v OFS=3D"\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2=
-
>
> >> Regards,
>
> >> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> >> - Show quoted text -
>
> >That worked great..Thanks a lot Ed and Janis
>
> Of course, if you wanted to, you could put a few (more) greps and cats
> into that command line as well. =A0(This is my way of saying: Way too many=
> commands!)
>
> Since you can, of course, do it all with one simple (g)AWK command, and
> not have to use all those other commands.
>
> (Details left as an exercise...)
gawk or awk?
Since a sort function is non-standard in awk's I'd be
interested to know about your _simple_ awk command
that does that task.
I could implement the OP's requirement in a few lines
standard awk, but rarely as compact as with the Unix
tools.
Janis
Re: sort 3rd col in a file and print
am 17.01.2008 14:50:11 von gazelle
In article <6f94c075-6e65-4a31-9930-52d165d2c5d2@e25g2000prg.googlegroups.com>,
Janis wrote:
>On 17 Jan., 10:49, gaze...@xmission.xmission.com (Kenny McCormack)
>wrote:
>> In article
>,explor
> wrote:
>>
>> ...
>>
>>
>>
>> >> awk -v OFS="\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
>>
>> >> Regards,
>>
>> >> Ed.- Hide quoted text -
>>
>> >> - Show quoted text -
>>
>> >That worked great..Thanks a lot Ed and Janis
>>
>> Of course, if you wanted to, you could put a few (more) greps and cats
>> into that command line as well. (This is my way of saying: Way too many
>> commands!)
>>
>> Since you can, of course, do it all with one simple (g)AWK command, and
>> not have to use all those other commands.
>>
>> (Details left as an exercise...)
>
>gawk or awk?
>
>Since a sort function is non-standard in awk's I'd be
>interested to know about your _simple_ awk command
>that does that task.
>
>I could implement the OP's requirement in a few lines
>standard awk, but rarely as compact as with the Unix
>tools.
I did say gawk, which I consider to be a freely available tool.
And, if you are doing AWK and don't have at least gawk to work with,
you're just wasting your time.
As you know, sorting is easily done in gawk.
Further, it may not even be necessary to sort at all. I'm not
necessarily saying that what I am about to propose is either more
efficient or easier to code (fewer characters to type), but it does seem
like if the goal is merely to keep the top 10 of something, that you
could do that along the lines of a "find the biggest" algorithm.
But this is just my random musings for the day...
Re: sort 3rd col in a file and print
am 17.01.2008 15:46:40 von Janis Papanagnou
On 17 Jan., 14:50, gaze...@xmission.xmission.com (Kenny McCormack)
wrote:
> In article <6f94c075-6e65-4a31-9930-52d165d2c...@e25g2000prg.googlegroups.=
com>,
>
>
>
>
>
> Janis =A0 wrote:
> >On 17 Jan., 10:49, gaze...@xmission.xmission.com (Kenny McCormack)
> >wrote:
> >> In article
> >,explo=
r
> >=A0 wrote:
>
> >> ...
>
> >> >> awk -v OFS=3D"\t" '{print $NF,$0}' file | sort -rn | head -10 | cut =
-f2-
>
> >> >> Regards,
>
> >> >> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> >> >> - Show quoted text -
>
> >> >That worked great..Thanks a lot Ed and Janis
>
> >> Of course, if you wanted to, you could put a few (more) greps and cats
> >> into that command line as well. =A0(This is my way of saying: Way too m=
any
> >> commands!)
>
> >> Since you can, of course, do it all with one simple (g)AWK command, and=
> >> not have to use all those other commands.
>
> >> (Details left as an exercise...)
>
> >gawk or awk?
>
> >Since a sort function is non-standard in awk's I'd be
> >interested to know about your _simple_ awk command
> >that does that task.
>
> >I could implement the OP's requirement in a few lines
> >standard awk, but rarely as compact as with the Unix
> >tools.
>
> I did say gawk,
You did say (g)AWK and seem to have meant gawk. Okay.
> which I consider to be a freely available tool.
(That can be, unfortunately, irrelevant in environments
where you are not allowed to run self-installed software.)
> And, if you are doing AWK and don't have at least gawk to work with,
> you're just wasting your time.
Oh, that's a bit exaggerated, isn't it. :-)
> As you know, sorting is easily done in gawk.
Quite as easy as with the specialized shell-level tool.
(Yet, I don't seem to recall sorting a) numerically, b)
on the last field only, using gawk; I think I must do
some padding first, and then... - well, quite as easy.)
> Further, it may not even be necessary to sort at all. =A0I'm not
> necessarily saying that what I am about to propose is either more
> efficient or easier to code (fewer characters to type), but it does seem
> like if the goal is merely to keep the top 10 of something, that you
> could do that along the lines of a "find the biggest" algorithm.
Indeed; that's why I wrote that I could do it "in a few lines".
I was curious about the existence of a solution as "simple" as
using the Unix tools and thought you might know one, given
what you wrote.
Using - if possible - GNU awk is, of course, a fine thing...
we're well aware of that.
> But this is just my random musings for the day
Janis
Re: sort 3rd col in a file and print
am 17.01.2008 19:46:51 von Ed Morton
On 1/17/2008 3:49 AM, Kenny McCormack wrote:
> In article ,
> explor wrote:
> ...
>
>>>awk -v OFS="\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
>>>
>>>Regards,
>>>
>>> Ed.- Hide quoted text -
>>>
>>>- Show quoted text -
>>
>>That worked great..Thanks a lot Ed and Janis
>
>
> Of course, if you wanted to, you could put a few (more) greps and cats
> into that command line as well. (This is my way of saying: Way too many
> commands!)
>
> Since you can, of course, do it all with one simple (g)AWK command, and
> not have to use all those other commands.
>
> (Details left as an exercise...)
>
I assume since you didn't post the awk-only solution that you feel the effort to
write that is disproportionate to the benefits to the OP of using that vs the
pipeline above. I agree.
Ed.
Re: sort 3rd col in a file and print
am 17.01.2008 23:44:33 von Icarus Sparry
On Thu, 17 Jan 2008 12:46:51 -0600, Ed Morton wrote:
> On 1/17/2008 3:49 AM, Kenny McCormack wrote:
>> In article
>> ,
>> explor wrote:
>> ...
>>
>>>>awk -v OFS="\t" '{print $NF,$0}' file | sort -rn | head -10 | cut -f2-
>>>>
>>>>Regards,
>>>>
>>>> Ed.- Hide quoted text -
>>>>
>>>>- Show quoted text -
>>>
>>>That worked great..Thanks a lot Ed and Janis
>>
>>
>> Of course, if you wanted to, you could put a few (more) greps and cats
>> into that command line as well. (This is my way of saying: Way too
>> many commands!)
>>
>> Since you can, of course, do it all with one simple (g)AWK command, and
>> not have to use all those other commands.
>>
>> (Details left as an exercise...)
>>
>>
> I assume since you didn't post the awk-only solution that you feel the
> effort to write that is disproportionate to the benefits to the OP of
> using that vs the pipeline above. I agree.
>
> Ed.
Since the original poster stated that they had a 10,000 line file and
gave no indication that this was time critical or even a particularly
common operation, you are probably correct to agree. If the problem had
been a 10,000,000 line then it would have been a different matter. In
these days of multi gigabyte main memories, 10,000 lines is pretty small.
If one assumes the constants are about the same, the difference between
sorting and selection will only make one order of magnitude difference.
In some ways I am disapointed that the OP was so easily pursuaded to make
a copy of the data, with the value to be sorted copied to the start. The
"sort" program probably could have done all that was required (apart from
the "head -10" of course).
Re: sort 3rd col in a file and print
am 20.01.2008 11:10:37 von William James
On Jan 16, 1:59 pm, Janis Papanagnou
wrote:
> One possibility to sort this is...
>
> awk '{printf("%10d %s\n",$NF,$0)}' infile | sort -nr | cut -c12-
ruby -e'puts ARGF.sort_by{|x| -x[/\d+\n/].to_i }[0,10]' infile