ignoring non [0-9] chars when searching in a file
ignoring non [0-9] chars when searching in a file
am 23.11.2007 13:52:37 von John Bagins
I want to search for telephone numbers in my text database where the
entries are in the form:
Tel: 99 99 999 9999 999
or
Tel: +99-999/ 9999 999
or any combination of spaces and punctuation chars.
I want to feed as input numbers in the form of
99999999999
What is the most efficient way to do this without first creating
an index of space and punctuation removed numbers?
Thanks
Eric
Re: ignoring non [0-9] chars when searching in a file
am 23.11.2007 17:53:39 von Janis Papanagnou
John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
So the number of digits in the file is arbitrary, or are there just
the two possibilities of 14 or resp. 12 digits?
>
> I want to feed as input numbers in the form of
> 99999999999
And you want to search for 9 digit numbers? How is the semantics of
a match defined? (Match prefix, suffix, or arbitrary elisions in the
mid of the number?)
What shall the output be; all closly matching numbers, or all exact
matches. Just an output found/not found, or the number as stored in
the file?
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?
Depends on what you need. Clarify your requirements first. Provide
a few samples of real (or at least meaningful) data and the desired
output.
Janis
>
> Thanks
>
> Eric
Re: ignoring non [0-9] chars when searching in a file
am 23.11.2007 20:31:24 von Cyrus Kriticos
John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
>
> I want to feed as input numbers in the form of
> 99999999999
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?
sed "s/[^0-9]//g" FILENAME | grep
--
Best regards | Be nice to America or they'll bring democracy to
Cyrus | your country.
Re: ignoring non [0-9] chars when searching in a file
am 24.11.2007 01:07:07 von bmynars
On Nov 23, 7:52 am, John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
>
> I want to feed as input numbers in the form of
> 99999999999
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?
>
> Thanks
>
> Eric
ksh93 or bash
var=
var=${var//[^[:digit:]]/}
This will replace spaces and punctuation with nothing, squeezing
everything together to the format you want. Simple and very
efficient. No need to use external tools like 'sed', awk', tr, and
others.
Re: ignoring non [0-9] chars when searching in a file
am 25.11.2007 15:10:16 von Ed Morton
On 11/23/2007 6:52 AM, John Bagins wrote:
> I want to search for telephone numbers in my text database where the
> entries are in the form:
> Tel: 99 99 999 9999 999
> or
> Tel: +99-999/ 9999 999
> or any combination of spaces and punctuation chars.
>
> I want to feed as input numbers in the form of
> 99999999999
>
> What is the most efficient way to do this without first creating
> an index of space and punctuation removed numbers?
>
> Thanks
>
> Eric
This may or may not be more efficient than other solutions posted, but it just
uses one command and creates a solid foundation that'll be easy to enhance and
maintain in future in case your requirements change:
awk '{s=$0;gsub(/[^[:digit:]]/,"",s)}p~s' p="99999999999" file
Ed.