How to filter a .csv file based on the integer value in one specific field (per record)

How to filter a .csv file based on the integer value in one specific field (per record)

am 14.01.2008 21:40:06 von Brian Greaney

Hi, hope someone can help me (again!)
I have a large .csv file full of text & numbers.
I would like to 'filter' this file based on several 'keys', 2 text strings
(use grep I think) and the integer value of a specific field being greater
than a certain value (e.g. 100000).
I can see a way of using grep to filter on the two strings, but can't see
an elegant way of getting the numerical test on a specific field, bearing
in mind numbers occur in other fields.
The text fields I filter on are of the form ABC12 the integer from 2 to 6
digits
Hope this is clear (and not too dumb/newbie/rtfm a question) :)

Re: How to filter a .csv file based on the integer value in one specificfield (per record)

am 14.01.2008 21:59:17 von Ed Morton

On 1/14/2008 2:40 PM, Brian Greaney wrote:
> Hi, hope someone can help me (again!)
> I have a large .csv file full of text & numbers.
> I would like to 'filter' this file based on several 'keys', 2 text strings
> (use grep I think) and the integer value of a specific field being greater
> than a certain value (e.g. 100000).
> I can see a way of using grep to filter on the two strings, but can't see
> an elegant way of getting the numerical test on a specific field, bearing
> in mind numbers occur in other fields.
> The text fields I filter on are of the form ABC12 the integer from 2 to 6
> digits
> Hope this is clear (and not too dumb/newbie/rtfm a question) :)

There are many different ways a ".csv" file could be structured as there's no
specific standard for one (though there are some attempts at such on the
internet) so you need to post a small set of sample input and expected output.

Make sure you tell us which specific fields you're interested in. If the 2 text
fields are field 3 and field 7, and the integer's in field 12, the solution MAY
be (but probably isn't) as simple as:

awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file

Ed.

Re: How to filter a .csv file based on the integer value in one specific field (per record)

am 14.01.2008 22:23:22 von Brian Greaney

On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:

> On 1/14/2008 2:40 PM, Brian Greaney wrote:
>> Hi, hope someone can help me (again!)
>> I have a large .csv file full of text & numbers.
>> I would like to 'filter' this file based on several 'keys', 2 text strings
>> (use grep I think) and the integer value of a specific field being greater
>> than a certain value (e.g. 100000).
>> I can see a way of using grep to filter on the two strings, but can't see
>> an elegant way of getting the numerical test on a specific field, bearing
>> in mind numbers occur in other fields.
>> The text fields I filter on are of the form ABC12 the integer from 2 to 6
>> digits
>> Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>
> There are many different ways a ".csv" file could be structured as there's no
> specific standard for one (though there are some attempts at such on the
> internet) so you need to post a small set of sample input and expected output.
>
> Make sure you tell us which specific fields you're interested in. If the 2 text
> fields are field 3 and field 7, and the integer's in field 12, the solution MAY
> be (but probably isn't) as simple as:
>
> awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>
> Ed.
Actual .csv file is 109 columns by >3,000 rows (and yes it should be a
database, it just grew and grew) so I didn't really want a post filled with junk.

Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
field 3> 100000

A sample row is below although I have masked some text out with xxxx
The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this
record. Note there are a large number of blanks and I've had to chop the
lines up to keep the mail program happy!:

LAC0102 3 9 LHR0102 2 10,LAC01 3 2 LHR01 3 2,4842,,xxxxxxxx Old
Tower,Green,,,,,,,,,LXXX SXXXXXXXX,Green,,,,,,,,64000,64k
Voice,VXXXXX,TPX 4969 - VXXXXXX XXX XXXXXXXX,4w
Analogue,,,,,LAC0102,3,10chEM,,3,30227,
,28,27,9,3,2,LAC01,10,10,38,38,40,10,10,1,2,40,LHR01,LHR01,3 ,2,LHR0102,2,10chEM,
2,6,5,10,,,30205,
,,,,,,CXXX810330,CXXX810330,FXXX342947,FXXS342947,CXXX612010 ,CXXX12010,FXXX343412,
FXXXX43412,CXXX805295,CXXX805295,0.484,1141,Bundled
T/S,,,,,,,,,,4.1,18-Dec-07,,,,0,2 3 3,2 0 3,

Thanks for your help!

Re: How to filter a .csv file based on the integer value in one specific field (per record)

am 14.01.2008 22:25:25 von Brian Greaney

On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:

> On 1/14/2008 2:40 PM, Brian Greaney wrote:
>> Hi, hope someone can help me (again!)
>> I have a large .csv file full of text & numbers.
>> I would like to 'filter' this file based on several 'keys', 2 text strings
>> (use grep I think) and the integer value of a specific field being greater
>> than a certain value (e.g. 100000).
>> I can see a way of using grep to filter on the two strings, but can't see
>> an elegant way of getting the numerical test on a specific field, bearing
>> in mind numbers occur in other fields.
>> The text fields I filter on are of the form ABC12 the integer from 2 to 6
>> digits
>> Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>
> There are many different ways a ".csv" file could be structured as there's no
> specific standard for one (though there are some attempts at such on the
> internet) so you need to post a small set of sample input and expected output.
>
> Make sure you tell us which specific fields you're interested in. If the 2 text
> fields are field 3 and field 7, and the integer's in field 12, the solution MAY
> be (but probably isn't) as simple as:
>
> awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>
> Ed.

A quick follow up, with a few minor changes your suggestion works well for
me:
awk 'BEGIN{FS=OFS=","}$56 == "CLH01" && $45 != "LAC" && ($3 >=100000)'
file

Many thanks, I'll have to get read up on awk
:)

Re: How to filter a .csv file based on the integer value in one specificfield (per record)

am 14.01.2008 22:39:47 von Ed Morton

On 1/14/2008 3:23 PM, Brian Greaney wrote:
> On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:
>
>
>>On 1/14/2008 2:40 PM, Brian Greaney wrote:
>>
>>>Hi, hope someone can help me (again!)
>>>I have a large .csv file full of text & numbers.
>>>I would like to 'filter' this file based on several 'keys', 2 text strings
>>>(use grep I think) and the integer value of a specific field being greater
>>>than a certain value (e.g. 100000).
>>>I can see a way of using grep to filter on the two strings, but can't see
>>>an elegant way of getting the numerical test on a specific field, bearing
>>>in mind numbers occur in other fields.
>>>The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>>digits
>>>Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>>
>>There are many different ways a ".csv" file could be structured as there's no
>>specific standard for one (though there are some attempts at such on the
>>internet) so you need to post a small set of sample input and expected output.
>>
>>Make sure you tell us which specific fields you're interested in. If the 2 text
>>fields are field 3 and field 7, and the integer's in field 12, the solution MAY
>>be (but probably isn't) as simple as:
>>
>>awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>>
>> Ed.
>
> Actual .csv file is 109 columns by >3,000 rows (and yes it should be a
> database, it just grew and grew) so I didn't really want a post filled with junk.
>
> Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
> field 3> 100000
>
> A sample row is below although I have masked some text out with xxxx
> The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this
> record. Note there are a large number of blanks and I've had to chop the
> lines up to keep the mail program happy!:
>
> LAC0102 3 9 LHR0102 2 10,LAC01 3 2 LHR01 3 2,4842,,xxxxxxxx Old
> Tower,Green,,,,,,,,,LXXX SXXXXXXXX,Green,,,,,,,,64000,64k
> Voice,VXXXXX,TPX 4969 - VXXXXXX XXX XXXXXXXX,4w
> Analogue,,,,,LAC0102,3,10chEM,,3,30227,
> ,28,27,9,3,2,LAC01,10,10,38,38,40,10,10,1,2,40,LHR01,LHR01,3 ,2,LHR0102,2,10chEM,
> 2,6,5,10,,,30205,
> ,,,,,,CXXX810330,CXXX810330,FXXX342947,FXXS342947,CXXX612010 ,CXXX12010,FXXX343412,
> FXXXX43412,CXXX805295,CXXX805295,0.484,1141,Bundled
> T/S,,,,,,,,,,4.1,18-Dec-07,,,,0,2 3 3,2 0 3,
>
> Thanks for your help!
>

Based on these requirements/samples for filtering:

1) The text fields I filter on are of the form ABC12 the integer from 2 to 6
digits
2) Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
field 3> 100000
3) The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this record.

the text filtering requirement is pretty clear (3 all-upper-case letters
followed by 2 digits), but the integer still isn't since "field 3>100000"
matches neither "from 2 to 6 digits" nor "3 = 4842".

If I assume you actually just want to find integer values between 10 and 100000
then as long as there's no quoted or escaped commas WITHIN the fields and the
3rd field is guaranteed to be an integer, this should do it:

awk 'BEGIN{FS=OFS=","}
$45 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
$56 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
($3 >= 10) && ($3 <= 100000)' file

Regards,

Ed.

Re: How to filter a .csv file based on the integer value in one specificfield (per record)

am 14.01.2008 22:43:46 von Ed Morton

On 1/14/2008 3:25 PM, Brian Greaney wrote:
> On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:
>
>
>>On 1/14/2008 2:40 PM, Brian Greaney wrote:
>>
>>>Hi, hope someone can help me (again!)
>>>I have a large .csv file full of text & numbers.
>>>I would like to 'filter' this file based on several 'keys', 2 text strings
>>>(use grep I think) and the integer value of a specific field being greater
>>>than a certain value (e.g. 100000).
>>>I can see a way of using grep to filter on the two strings, but can't see
>>>an elegant way of getting the numerical test on a specific field, bearing
>>>in mind numbers occur in other fields.
>>>The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>>digits
>>>Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>>
>>There are many different ways a ".csv" file could be structured as there's no
>>specific standard for one (though there are some attempts at such on the
>>internet) so you need to post a small set of sample input and expected output.
>>
>>Make sure you tell us which specific fields you're interested in. If the 2 text
>>fields are field 3 and field 7, and the integer's in field 12, the solution MAY
>>be (but probably isn't) as simple as:
>>
>>awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>>
>> Ed.
>
>
> A quick follow up, with a few minor changes your suggestion works well for
> me:
> awk 'BEGIN{FS=OFS=","}$56 == "CLH01" && $45 != "LAC" && ($3 >=100000)'
> file

That'll find those specific text strings. See my other response to match any 3
upper case letters followed by 2 digits.

> Many thanks, I'll have to get read up on awk
> :)
>

Lurk around comp.lang.awk for a while and get the book Effective Awk
Programming, Third Edition By Arnold Robbins,
http://www.oreilly.com/catalog/awkprog3/.

Ed.

Re: How to filter a .csv file based on the integer value in one specific field (per record)

am 14.01.2008 23:00:39 von Brian Greaney

On Mon, 14 Jan 2008 15:39:47 -0600, Ed Morton wrote:

>
>
> On 1/14/2008 3:23 PM, Brian Greaney wrote:
>> On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:
>>
>>
>>>On 1/14/2008 2:40 PM, Brian Greaney wrote:
>>>
>>>>Hi, hope someone can help me (again!)
>>>>I have a large .csv file full of text & numbers.
>>>>I would like to 'filter' this file based on several 'keys', 2 text strings
>>>>(use grep I think) and the integer value of a specific field being greater
>>>>than a certain value (e.g. 100000).
>>>>I can see a way of using grep to filter on the two strings, but can't see
>>>>an elegant way of getting the numerical test on a specific field, bearing
>>>>in mind numbers occur in other fields.
>>>>The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>>>digits
>>>>Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>>>
>>>There are many different ways a ".csv" file could be structured as there's no
>>>specific standard for one (though there are some attempts at such on the
>>>internet) so you need to post a small set of sample input and expected output.
>>>
>>>Make sure you tell us which specific fields you're interested in. If the 2 text
>>>fields are field 3 and field 7, and the integer's in field 12, the solution MAY
>>>be (but probably isn't) as simple as:
>>>
>>>awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>>>
>>> Ed.
>>
>> Actual .csv file is 109 columns by >3,000 rows (and yes it should be a
>> database, it just grew and grew) so I didn't really want a post filled with junk.
>>
>> Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
>> field 3> 100000
>>
>> A sample row is below although I have masked some text out with xxxx
>> The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this
>> record. Note there are a large number of blanks and I've had to chop the
>> lines up to keep the mail program happy!:
>>
>> LAC0102 3 9 LHR0102 2 10,LAC01 3 2 LHR01 3 2,4842,,xxxxxxxx Old
>> Tower,Green,,,,,,,,,LXXX SXXXXXXXX,Green,,,,,,,,64000,64k
>> Voice,VXXXXX,TPX 4969 - VXXXXXX XXX XXXXXXXX,4w
>> Analogue,,,,,LAC0102,3,10chEM,,3,30227,
>> ,28,27,9,3,2,LAC01,10,10,38,38,40,10,10,1,2,40,LHR01,LHR01,3 ,2,LHR0102,2,10chEM,
>> 2,6,5,10,,,30205,
>> ,,,,,,CXXX810330,CXXX810330,FXXX342947,FXXS342947,CXXX612010 ,CXXX12010,FXXX343412,
>> FXXXX43412,CXXX805295,CXXX805295,0.484,1141,Bundled
>> T/S,,,,,,,,,,4.1,18-Dec-07,,,,0,2 3 3,2 0 3,
>>
>> Thanks for your help!
>>
>
> Based on these requirements/samples for filtering:
>
> 1) The text fields I filter on are of the form ABC12 the integer from 2 to 6
> digits
> 2) Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
> field 3> 100000
> 3) The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this record.
>
> the text filtering requirement is pretty clear (3 all-upper-case letters
> followed by 2 digits), but the integer still isn't since "field 3>100000"
> matches neither "from 2 to 6 digits" nor "3 = 4842".
>
> If I assume you actually just want to find integer values between 10 and 100000
> then as long as there's no quoted or escaped commas WITHIN the fields and the
> 3rd field is guaranteed to be an integer, this should do it:
>
> awk 'BEGIN{FS=OFS=","}
> $45 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
> $56 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
> ($3 >= 10) && ($3 <= 100000)' file
>
> Regards,
>
> Ed.

Ed
With your great help I'm getting there.. and I've just found a hiccup
In the above my field 45 will always start LAC but could be O1 or 02, I
think therefore I can just loose the 2 [[:digit:]] parts? - BUT I need it
NOT EQUALS. Can I also pass the match as a string?? (pattern="^CLH01\$")
? I think I need to take your other suggestion about the book.

PS the 3rd field is always an integer with no commas

Re: How to filter a .csv file based on the integer value in one specificfield (per record)

am 14.01.2008 23:47:43 von Ed Morton

On 1/14/2008 4:00 PM, Brian Greaney wrote:
> On Mon, 14 Jan 2008 15:39:47 -0600, Ed Morton wrote:
>
>
>>
>>On 1/14/2008 3:23 PM, Brian Greaney wrote:
>>
>>>On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:
>>>
>>>
>>>
>>>>On 1/14/2008 2:40 PM, Brian Greaney wrote:
>>>>
>>>>
>>>>>Hi, hope someone can help me (again!)
>>>>>I have a large .csv file full of text & numbers.
>>>>>I would like to 'filter' this file based on several 'keys', 2 text strings
>>>>>(use grep I think) and the integer value of a specific field being greater
>>>>>than a certain value (e.g. 100000).
>>>>>I can see a way of using grep to filter on the two strings, but can't see
>>>>>an elegant way of getting the numerical test on a specific field, bearing
>>>>>in mind numbers occur in other fields.
>>>>>The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>>>>digits
>>>>>Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>>>>
>>>>There are many different ways a ".csv" file could be structured as there's no
>>>>specific standard for one (though there are some attempts at such on the
>>>>internet) so you need to post a small set of sample input and expected output.
>>>>
>>>>Make sure you tell us which specific fields you're interested in. If the 2 text
>>>>fields are field 3 and field 7, and the integer's in field 12, the solution MAY
>>>>be (but probably isn't) as simple as:
>>>>
>>>>awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>>>>
>>>> Ed.
>>>
>>>Actual .csv file is 109 columns by >3,000 rows (and yes it should be a
>>>database, it just grew and grew) so I didn't really want a post filled with junk.
>>>
>>>Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
>>>field 3> 100000
>>>
>>>A sample row is below although I have masked some text out with xxxx
>>>The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this
>>>record. Note there are a large number of blanks and I've had to chop the
>>>lines up to keep the mail program happy!:
>>>
>>>LAC0102 3 9 LHR0102 2 10,LAC01 3 2 LHR01 3 2,4842,,xxxxxxxx Old
>>>Tower,Green,,,,,,,,,LXXX SXXXXXXXX,Green,,,,,,,,64000,64k
>>>Voice,VXXXXX,TPX 4969 - VXXXXXX XXX XXXXXXXX,4w
>>>Analogue,,,,,LAC0102,3,10chEM,,3,30227,
>>>,28,27,9,3,2,LAC01,10,10,38,38,40,10,10,1,2,40,LHR01,LHR0 1,3,2,LHR0102,2,10chEM,
>>>2,6,5,10,,,30205,
>>>,,,,,,CXXX810330,CXXX810330,FXXX342947,FXXS342947,CXXX612 010,CXXX12010,FXXX343412,
>>>FXXXX43412,CXXX805295,CXXX805295,0.484,1141,Bundled
>>>T/S,,,,,,,,,,4.1,18-Dec-07,,,,0,2 3 3,2 0 3,
>>>
>>>Thanks for your help!
>>>
>>
>>Based on these requirements/samples for filtering:
>>
>>1) The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>digits
>>2) Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
>>field 3> 100000
>>3) The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this record.
>>
>>the text filtering requirement is pretty clear (3 all-upper-case letters
>>followed by 2 digits), but the integer still isn't since "field 3>100000"
>>matches neither "from 2 to 6 digits" nor "3 = 4842".
>>
>>If I assume you actually just want to find integer values between 10 and 100000
>>then as long as there's no quoted or escaped commas WITHIN the fields and the
>>3rd field is guaranteed to be an integer, this should do it:
>>
>>awk 'BEGIN{FS=OFS=","}
>>$45 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
>>$56 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
>>($3 >= 10) && ($3 <= 100000)' file
>>
>>Regards,
>>
>> Ed.
>
>
> Ed
> With your great help I'm getting there.. and I've just found a hiccup
> In the above my field 45 will always start LAC but could be O1 or 02, I
> think therefore I can just loose the 2 [[:digit:]] parts? - BUT I need it
> NOT EQUALS. Can I also pass the match as a string?? (pattern="^CLH01\$")
> ? I think I need to take your other suggestion about the book.
>
> PS the 3rd field is always an integer with no commas
>
>

Re: How to filter a .csv file based on the integer value in one specificfield (per record)

am 14.01.2008 23:54:10 von Ed Morton

On 1/14/2008 4:00 PM, Brian Greaney wrote:
> On Mon, 14 Jan 2008 15:39:47 -0600, Ed Morton wrote:
>
>
>>
>>On 1/14/2008 3:23 PM, Brian Greaney wrote:
>>
>>>On Mon, 14 Jan 2008 14:59:17 -0600, Ed Morton wrote:
>>>
>>>
>>>
>>>>On 1/14/2008 2:40 PM, Brian Greaney wrote:
>>>>
>>>>
>>>>>Hi, hope someone can help me (again!)
>>>>>I have a large .csv file full of text & numbers.
>>>>>I would like to 'filter' this file based on several 'keys', 2 text strings
>>>>>(use grep I think) and the integer value of a specific field being greater
>>>>>than a certain value (e.g. 100000).
>>>>>I can see a way of using grep to filter on the two strings, but can't see
>>>>>an elegant way of getting the numerical test on a specific field, bearing
>>>>>in mind numbers occur in other fields.
>>>>>The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>>>>digits
>>>>>Hope this is clear (and not too dumb/newbie/rtfm a question) :)
>>>>
>>>>There are many different ways a ".csv" file could be structured as there's no
>>>>specific standard for one (though there are some attempts at such on the
>>>>internet) so you need to post a small set of sample input and expected output.
>>>>
>>>>Make sure you tell us which specific fields you're interested in. If the 2 text
>>>>fields are field 3 and field 7, and the integer's in field 12, the solution MAY
>>>>be (but probably isn't) as simple as:
>>>>
>>>>awk 'BEGIN{FS=OFS=","}$3 == "ABC12" && $7 == "ABC12" && ($12 >=2) && ($12 <=6)' file
>>>>
>>>> Ed.
>>>
>>>Actual .csv file is 109 columns by >3,000 rows (and yes it should be a
>>>database, it just grew and grew) so I didn't really want a post filled with junk.
>>>
>>>Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
>>>field 3> 100000
>>>
>>>A sample row is below although I have masked some text out with xxxx
>>>The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this
>>>record. Note there are a large number of blanks and I've had to chop the
>>>lines up to keep the mail program happy!:
>>>
>>>LAC0102 3 9 LHR0102 2 10,LAC01 3 2 LHR01 3 2,4842,,xxxxxxxx Old
>>>Tower,Green,,,,,,,,,LXXX SXXXXXXXX,Green,,,,,,,,64000,64k
>>>Voice,VXXXXX,TPX 4969 - VXXXXXX XXX XXXXXXXX,4w
>>>Analogue,,,,,LAC0102,3,10chEM,,3,30227,
>>>,28,27,9,3,2,LAC01,10,10,38,38,40,10,10,1,2,40,LHR01,LHR0 1,3,2,LHR0102,2,10chEM,
>>>2,6,5,10,,,30205,
>>>,,,,,,CXXX810330,CXXX810330,FXXX342947,FXXS342947,CXXX612 010,CXXX12010,FXXX343412,
>>>FXXXX43412,CXXX805295,CXXX805295,0.484,1141,Bundled
>>>T/S,,,,,,,,,,4.1,18-Dec-07,,,,0,2 3 3,2 0 3,
>>>
>>>Thanks for your help!
>>>
>>
>>Based on these requirements/samples for filtering:
>>
>>1) The text fields I filter on are of the form ABC12 the integer from 2 to 6
>>digits
>>2) Filter would typically be Anode (field 45)=LAC01 & Bnode (field 56)=CLH01
>>field 3> 100000
>>3) The fields 45 = LAC01, 56=LHR01 & 3 = 4842, so I would filter out this record.
>>
>>the text filtering requirement is pretty clear (3 all-upper-case letters
>>followed by 2 digits), but the integer still isn't since "field 3>100000"
>>matches neither "from 2 to 6 digits" nor "3 = 4842".
>>
>>If I assume you actually just want to find integer values between 10 and 100000
>>then as long as there's no quoted or escaped commas WITHIN the fields and the
>>3rd field is guaranteed to be an integer, this should do it:
>>
>>awk 'BEGIN{FS=OFS=","}
>>$45 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
>>$56 ~ /^[[:upper:]][[:upper:]][[:upper:]][[:digit:]][[:digit:]]$/ &&
>>($3 >= 10) && ($3 <= 100000)' file
>>
>>Regards,
>>
>> Ed.
>
>
> Ed
> With your great help I'm getting there.. and I've just found a hiccup
> In the above my field 45 will always start LAC but could be O1 or 02, I
> think therefore I can just loose the 2 [[:digit:]] parts?

Yes, or just make it

$45 ~ /^LAC0[12]$/

> - BUT I need it
> NOT EQUALS.

Then make it:

$45 !~ /^LAC0[12]$/

> Can I also pass the match as a string?? (pattern="^CLH01\$")
> ?

Yes:

awk -v pattern="^CLH01$" '
....
$45 ~ pattern { ... }
.....' file

> I think I need to take your other suggestion about the book.

It's well worth the time/expense. It's also a good way of learning to force
yourself to use awk for all your text manipulation even if you know how to do
whatever you want with a mixture of sed and shell and whatever...

> PS the 3rd field is always an integer with no commas

OK.

Ed.