ignoring non [0-9] chars when searching in a file

ignoring non [0-9] chars when searching in a file

am 23.11.2007 13:52:37 von John Bagins

I want to search for telephone numbers in my text database where the
entries are in the form:
Tel: 99 99 999 9999 999
or
Tel: +99-999/ 9999 999
or any combination of spaces and punctuation chars.

I want to feed as input numbers in the form of
99999999999

What is the most efficient way to do this without first creating
an index of space and punctuation removed numbers?

Thanks

Eric

Re: ignoring non [0-9] chars when searching in a file

am 23.11.2007 17:53:39 von Janis Papanagnou

John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.

So the number of digits in the file is arbitrary, or are there just
the two possibilities of 14 or resp. 12 digits?

>
> I want to feed as input numbers in the form of
> 99999999999

And you want to search for 9 digit numbers? How is the semantics of
a match defined? (Match prefix, suffix, or arbitrary elisions in the
mid of the number?)

What shall the output be; all closly matching numbers, or all exact
matches. Just an output found/not found, or the number as stored in
the file?

>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?

Depends on what you need. Clarify your requirements first. Provide
a few samples of real (or at least meaningful) data and the desired
output.

Janis

>
> Thanks
>
> Eric

Re: ignoring non [0-9] chars when searching in a file

am 23.11.2007 20:31:24 von Cyrus Kriticos

John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
>
> I want to feed as input numbers in the form of
> 99999999999
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?

sed "s/[^0-9]//g" FILENAME | grep

--
Best regards | Be nice to America or they'll bring democracy to
Cyrus | your country.

Re: ignoring non [0-9] chars when searching in a file

am 24.11.2007 01:07:07 von bmynars

On Nov 23, 7:52 am, John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
>
> I want to feed as input numbers in the form of
> 99999999999
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?
>
> Thanks
>
> Eric

ksh93 or bash

var=
var=${var//[^[:digit:]]/}

This will replace spaces and punctuation with nothing, squeezing
everything together to the format you want. Simple and very
efficient. No need to use external tools like 'sed', awk', tr, and
others.

Re: ignoring non [0-9] chars when searching in a file

am 25.11.2007 15:10:16 von Ed Morton

On 11/23/2007 6:52 AM, John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
>
> I want to feed as input numbers in the form of
> 99999999999
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?
>
> Thanks
>
> Eric

This may or may not be more efficient than other solutions posted, but it just
uses one command and creates a solid foundation that'll be easy to enhance and
maintain in future in case your requirements change:

awk '{s=$0;gsub(/[^[:digit:]]/,"",s)}p~s' p="99999999999" file

Ed.