Extracting Abbreviations In A File

Extracting Abbreviations In A File

am 28.09.2007 14:02:19 von derya.susman

Hi,

I want to extract abbreviations in a file. An abbreviation may consist
of capital letters and digits. How can I accomplish this? Since grep
returns lines, it does not help much. I could not make use of sed
either.

Thanks in advance.

Re: Extracting Abbreviations In A File

am 28.09.2007 14:05:34 von dozzie

On 28.09.2007, D. Susman wrote:
> I want to extract abbreviations in a file. An abbreviation may consist
> of capital letters and digits. How can I accomplish this? Since grep
> returns lines, it does not help much.

GNU grep has "-o" option.

> I could not make use of sed
> either.

Perl? AWK?

--
Secunia non olet.
Stanislaw Klekot

Re: Extracting Abbreviations In A File

am 28.09.2007 14:14:00 von derya.susman

On Sep 28, 3:05 pm, "Stachu 'Dozzie' K."
wrote:
> On 28.09.2007, D. Susman wrote:
>
> > I want to extract abbreviations in a file. An abbreviation may consist
> > of capital letters and digits. How can I accomplish this? Since grep
> > returns lines, it does not help much.
>
> GNU grep has "-o" option.
>
> > I could not make use of sed
> > either.
>
> Perl? AWK?
>
> --
> Secunia non olet.
> Stanislaw Klekot

I am using the standard grep. Is there a way for plain grep to return
words?

Soluitons based on awk also would be appreciated.

Re: Extracting Abbreviations In A File

am 28.09.2007 14:34:02 von William James

On Sep 28, 7:02 am, "D. Susman" wrote:
> Hi,
>
> I want to extract abbreviations in a file. An abbreviation may consist
> of capital letters and digits. How can I accomplish this? Since grep
> returns lines, it does not help much. I could not make use of sed
> either.
>
> Thanks in advance.

Assuming only 1 abbreviation in a line:

awk 'match($0,/[A-Z][A-Z0-9]+/){
print substr($0,RSTART,RLENGTH)}' myfile

Re: Extracting Abbreviations In A File

am 28.09.2007 14:50:11 von William James

On Sep 28, 7:02 am, "D. Susman" wrote:
> Hi,
>
> I want to extract abbreviations in a file. An abbreviation may consist
> of capital letters and digits. How can I accomplish this? Since grep
> returns lines, it does not help much. I could not make use of sed
> either.
>
> Thanks in advance.

Any number of abbreviations in a line.

gawk 'BEGIN{RS="[A-Z][A-Z0-9]+"}RT!=""{print RT}' file

Re: Extracting Abbreviations In A File

am 28.09.2007 14:57:41 von Tiago Peczenyj

My two cents:

$ cat -A filename # ^I is a tab
IBM adasd ^ICMMI UML RUP A3 xxxSQLyyy$

$ awk -v RS='[ \t]' '/\<[A-Z][A-Z0-9]+\>/' filename
IBM
CMMI
UML
RUP
A3


$ grep -oE '\b[A-Z][A-Z0-9]+\b' filename
IBM
CMMI
UML
RUP
A3

Best Regards
Tiago

On Sep 28, 9:02 am, "D. Susman" wrote:
> Hi,
>
> I want to extract abbreviations in a file. An abbreviation may consist
> of capital letters and digits. How can I accomplish this? Since grep
> returns lines, it does not help much. I could not make use of sed
> either.
>
> Thanks in advance.

Re: Extracting Abbreviations In A File

am 28.09.2007 15:49:54 von Maxwell Lol

"D. Susman" writes:

> Hi,
>
> I want to extract abbreviations in a file. An abbreviation may consist
> of capital letters and digits. How can I accomplish this? Since grep
> returns lines, it does not help much. I could not make use of sed
> either.
>
> Thanks in advance.
Here's another option

tr -dcs 'A-Z 0-9\n' ' '
It will output single numbers. You may need to use grep to add extra
conditions, i.e.

tr -dcs 'A-Z 0-9\n' ' '

Re: Extracting Abbreviations In A File

am 28.09.2007 15:54:52 von Ed Morton

D. Susman wrote:
> Hi,
>
> I want to extract abbreviations in a file. An abbreviation may consist
> of capital letters and digits. How can I accomplish this? Since grep
> returns lines, it does not help much. I could not make use of sed
> either.
>
> Thanks in advance.
>

In the text:

abcDEFghi

is "DEF" an abbreviation? If not, why not (i.e. what delimitters are
required around an "abbreviation")?

Ed.

Re: Extracting Abbreviations In A File

am 29.09.2007 19:11:07 von dozzie

On 28.09.2007, D. Susman wrote:
> On Sep 28, 3:05 pm, "Stachu 'Dozzie' K."
> wrote:
>> On 28.09.2007, D. Susman wrote:
>>
>> > I want to extract abbreviations in a file. An abbreviation may consist
>> > of capital letters and digits. How can I accomplish this? Since grep
>> > returns lines, it does not help much.
>>
>> GNU grep has "-o" option.
>>
>> > I could not make use of sed
>> > either.
>>
>> Perl? AWK?
>>
>> --
>> Secunia non olet.
>> Stanislaw Klekot
>
> I am using the standard grep. Is there a way for plain grep to return
> words?

I don't think so. I don't even know what is "plain grep", as the systems
implementing SUSv3 are adding their own functionality to grep
specification.

--
Secunia non olet.
Stanislaw Klekot