Print lines with n columns only

Print lines with n columns only

am 30.09.2007 17:14:56 von dave

I have a file which should have 104 columns, but due to some corruption,
sometimes there are more/less.

What is the best way to print only lines with 104 columns, and ignore
those with more or less?

Re: Print lines with n columns only

am 30.09.2007 17:20:55 von Cyrus Kriticos

Dave wrote:
> I have a file which should have 104 columns, but due to some corruption,
> sometimes there are more/less.
>
> What is the best way to print only lines with 104 columns, and ignore
> those with more or less?

egrep "^.{104}$" FILENAME

or

grep -E "^.{104}$ FILENAME

--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide

Re: Print lines with n columns only

am 30.09.2007 17:22:45 von Cyrus Kriticos

Dave wrote:
> I have a file which should have 104 columns, but due to some corruption,
> sometimes there are more/less.
>
> What is the best way to print only lines with 104 columns, and ignore
> those with more or less?

egrep "^.{104}$" FILENAME

or

grep -E "^.{104}$" FILENAME

--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide

Re: Print lines with n columns only

am 30.09.2007 18:14:27 von Janis Papanagnou

Dave wrote:
> I have a file which should have 104 columns, but due to some corruption,
> sometimes there are more/less.
>
> What is the best way to print only lines with 104 columns, and ignore
> those with more or less?

awk 'length()==104'


Janis

Re: Print lines with n columns only

am 30.09.2007 18:24:25 von William James

On Sep 30, 10:14 am, Dave wrote:
> I have a file which should have 104 columns, but due to some corruption,
> sometimes there are more/less.
>
> What is the best way to print only lines with 104 columns, and ignore
> those with more or less?

If you mean 104 characters:

awk '104==length' file

If you mean 104 fields:

awk '104==NF' file

Re: Print lines with n columns only

am 30.09.2007 18:28:16 von cfajohnson

On 2007-09-30, Dave wrote:
> I have a file which should have 104 columns, but due to some corruption,
> sometimes there are more/less.
>
> What is the best way to print only lines with 104 columns, and ignore
> those with more or less?

awk 'length == 104' FILE

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: Print lines with n columns only

am 30.09.2007 18:33:04 von Janis Papanagnou

William James wrote:
> On Sep 30, 10:14 am, Dave wrote:
>
>>I have a file which should have 104 columns, but due to some corruption,
>>sometimes there are more/less.
>>
>>What is the best way to print only lines with 104 columns, and ignore
>>those with more or less?
>
>
> If you mean 104 characters:
>
> awk '104==length' file

Note that (according to the GNU awk user manual) length() without
parenthesis is deprecated by POSIX.

Janis

>
> If you mean 104 fields:
>
> awk '104==NF' file
>

Re: Print lines with n columns only

am 30.09.2007 19:18:10 von Loki Harfagr

On Sun, 30 Sep 2007 17:22:45 +0200, Cyrus Kriticos wrote:

> Dave wrote:
>> I have a file which should have 104 columns, but due to some
>> corruption, sometimes there are more/less.
>>
>> What is the best way to print only lines with 104 columns, and ignore
>> those with more or less?
>
> egrep "^.{104}$" FILENAME
>
> or
>
> grep -E "^.{104}$" FILENAME


And, in the case the term "column" meant field separator
for the sake of completion, here's a possible expression,
(though it pre-asserts details to adapt to the real data):

$ egrep "^ ?([[:alnum:]]+ ){103}[[:alnum:]]+ ?$" yourfile


I'd personnally use awk 'NF==104' just because it's easier to
read three months later ;-) but in case of big files the
regexp is much faster.

Re: Print lines with n columns only

am 30.09.2007 19:26:52 von William James

On Sep 30, 11:33 am, Janis Papanagnou
wrote:
> William James wrote:
> > On Sep 30, 10:14 am, Dave wrote:
>
> >>I have a file which should have 104 columns, but due to some corruption,
> >>sometimes there are more/less.
>
> >>What is the best way to print only lines with 104 columns, and ignore
> >>those with more or less?
>
> > If you mean 104 characters:
>
> > awk '104==length' file
>
> Note that (according to the GNU awk user manual) length() without
> parenthesis is deprecated by POSIX.

Brian Kernighan's awk95, mawk, and gawk
accept it. So I say, "Deep six POSIX!"

Re: Print lines with n columns only

am 30.09.2007 20:30:54 von Ed Morton

Dave wrote:
> I have a file which should have 104 columns, but due to some corruption,
> sometimes there are more/less.
>
> What is the best way to print only lines with 104 columns, and ignore
> those with more or less?

If "columns" means space-separated fields:

awk 'NF==104' file

If "columns" means characters:

awk 'NF==104' FS= file

Regards,

Ed.

Re: Print lines with n columns only

am 01.10.2007 00:45:25 von cfajohnson

On 2007-09-30, William James wrote:
> On Sep 30, 11:33 am, Janis Papanagnou
> wrote:
>> William James wrote:
>> > On Sep 30, 10:14 am, Dave wrote:
>>
>> >>I have a file which should have 104 columns, but due to some corruption,
>> >>sometimes there are more/less.
>>
>> >>What is the best way to print only lines with 104 columns, and ignore
>> >>those with more or less?
>>
>> > If you mean 104 characters:
>>
>> > awk '104==length' file
>>
>> Note that (according to the GNU awk user manual) length() without
>> parenthesis is deprecated by POSIX.

Thanks for pointing that out.

> Brian Kernighan's awk95, mawk, and gawk
> accept it. So I say, "Deep six POSIX!"

While most versions of awk do accept it, they all also support the
POSIX variation, length(). The deprecated form may disappear, so it
makes no sense to use it when the correct version will always be
available.

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: Print lines with n columns only

am 01.10.2007 01:13:36 von Stephane CHAZELAS

2007-09-30, 17:22(+02), Cyrus Kriticos:
> Dave wrote:
>> I have a file which should have 104 columns, but due to some corruption,
>> sometimes there are more/less.
>>
>> What is the best way to print only lines with 104 columns, and ignore
>> those with more or less?
>
> egrep "^.{104}$" FILENAME
>
> or
>
> grep -E "^.{104}$" FILENAME

grep -Ex '.{104}'

awk 'length==104'

--
Stéphane

Re: Print lines with n columns only

am 01.10.2007 01:20:32 von Stephane CHAZELAS

2007-09-30, 18:33(+02), Janis Papanagnou:
[...]
>> awk '104==length' file
>
> Note that (according to the GNU awk user manual) length() without
> parenthesis is deprecated by POSIX.
[...]

I could not find any evidence of that at
http://www.opengroup.org/onlinepubs/009695399/utilities/awk. html

On the contrary:

Some historical implementations have allowed some built-in
functions to be called without an argument list, the result
being a default argument list chosen in some "reasonable"
way. Use of length as a synonym for length($0) is the only
one of these forms that is thought to be widely known or
widely used; this particular form is documented in various
places (for example, most historical awk reference pages,
although not in the referenced The AWK Programming
Language) as legitimate practice. With this exception,
default argument lists have always been undocumented and
vaguely defined, and it is not at all clear how (or if)
they should be generalized to user-defined functions. They
add no useful functionality and preclude possible future
extensions that might need to name functions without
calling them. Not standardizing them seems the simplest
course. The standard developers considered that length
merited special treatment, however, since it has been
documented in the past and sees possibly substantial use in
historical programs. Accordingly, this usage has been made
legitimate, but Issue 5 removed the obsolescent marking for
XSI-conforming implementations and many otherwise
conforming applications depend on this feature.

So "length" in place of "length($0)" seems quite legitimate, and
I've always used it that way.

--
Stéphane

Re: Print lines with n columns only

am 01.10.2007 01:28:39 von dave

Ed Morton wrote:
> Dave wrote:
>> I have a file which should have 104 columns, but due to some
>> corruption, sometimes there are more/less.
>>
>> What is the best way to print only lines with 104 columns, and ignore
>> those with more or less?
>
> If "columns" means space-separated fields:
>
> awk 'NF==104' file

Which I did. Thanks, that worked.

> If "columns" means characters:
>
> awk 'NF==104' FS= file
>
> Regards,
>
> Ed.

Re: Print lines with n columns only

am 01.10.2007 15:16:48 von Geoff Clare

Janis Papanagnou wrote:

> Note that (according to the GNU awk user manual) length() without
> parenthesis is deprecated by POSIX.

If the latest manual still says that, then it needs to be updated.
It's true that POSIX.2-1992 said plain "length" was obsolescent, but
when POSIX.2 was merged into POSIX.1 in 2001 the note about
obsolescence was removed.

--
Geoff Clare

Re: Print lines with n columns only

am 01.10.2007 17:10:18 von William James

On Sep 30, 11:28 am, "Chris F.A. Johnson"
wrote:
[...]
> > What is the best way to print only lines with 104 columns, and ignore
> > those with more or less?
>
> awk 'length == 104' FILE

On Sep 30, 5:45 pm, "Chris F.A. Johnson" wrote:
[...]
> While most versions of awk do accept it, they all also support the
> POSIX variation, length(). The deprecated form may disappear, so it
> makes no sense to use it when the correct version will always be
> available.

It makes no sense to use it and then to say it makes no sense to use
it.
Both versions are "correct", and both will always be available.

Re: Print lines with n columns only

am 01.10.2007 17:30:26 von Janis Papanagnou

William James wrote:
> On Sep 30, 11:28 am, "Chris F.A. Johnson"
> wrote:
> [...]
>
>>>What is the best way to print only lines with 104 columns, and ignore
>>>those with more or less?
>>
>>awk 'length == 104' FILE
>
>
> On Sep 30, 5:45 pm, "Chris F.A. Johnson" wrote:
> [...]
>
>> While most versions of awk do accept it, they all also support the
>> POSIX variation, length(). The deprecated form may disappear, so it
>> makes no sense to use it when the correct version will always be
>> available.
>
>
> It makes no sense to use it and then to say it makes no sense to use
> it.
> Both versions are "correct", and both will always be available.
>

The confusion came from the GNU awk user manual that apparently referenced
an old POSIX standard and the fact that something that once was deprecated
by POSIX now isn't any more. See Geoff Clare's posting.

The GNU awk user manual from 2004 still defines as syntax length([string])
while the online pubs quoted by Stephane specify length[([string])] .

Janis

Re: Print lines with n columns only

am 01.10.2007 22:21:23 von cfajohnson

On 2007-10-01, William James wrote:
> On Sep 30, 11:28 am, "Chris F.A. Johnson"
> wrote:
> [...]
>> > What is the best way to print only lines with 104 columns, and ignore
>> > those with more or less?
>>
>> awk 'length == 104' FILE
>
> On Sep 30, 5:45 pm, "Chris F.A. Johnson" wrote:
> [...]
>> While most versions of awk do accept it, they all also support the
>> POSIX variation, length(). The deprecated form may disappear, so it
>> makes no sense to use it when the correct version will always be
>> available.
>
> It makes no sense to use it and then to say it makes no sense to use
> it.

I was not aware that it had been deprecated; that's why I used it.

> Both versions are "correct", and both will always be available.

True, because it is no longer deprecated. My statement holds true
for any deprecated form.


--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence