Help extracting strings via awk.

Help extracting strings via awk.

am 07.12.2007 02:20:28 von oleg.rakhmanchik

Hi,

I need help extracting urls from a large text file. I don't have
control over the format of the file so it is always different, but the
urls are always in .. tags. The text is always on the same
line without line breaks.

asdfsdfwww.google.comdfgdgdgwww.yahoo.comasd
adfsdf sd sdgdfg...

The surrounding text is always different and I need the quickest and
most efficient way to extract just the text between the 2 tags and
output it somewhere. Right now I do this with several commands and it
takes a while for a large file, but I know there is probably a quicker
and better way to do this. Please help.

Re: Help extracting strings via awk.

am 07.12.2007 03:59:01 von Ed Morton

On 12/6/2007 7:20 PM, oleg.rakhmanchik@gmail.com wrote:
> Hi,
>
> I need help extracting urls from a large text file. I don't have
> control over the format of the file so it is always different, but the
> urls are always in .. tags. The text is always on the same
> line without line breaks.
>
> asdfsdfwww.google.comdfgdgdgwww.yahoo.comasd
> adfsdf sd sdgdfg...
>
> The surrounding text is always different and I need the quickest and
> most efficient way to extract just the text between the 2 tags and
> output it somewhere. Right now I do this with several commands and it
> takes a while for a large file, but I know there is probably a quicker
> and better way to do this. Please help.

With GNU awk:

gawk -F'' -v RS='' 'RT{print $NF}' file

Ed.

Re: Help extracting strings via awk.

am 07.12.2007 06:59:45 von Steffen Schuler

On Thu, 06 Dec 2007 20:59:01 -0600, Ed Morton wrote:

> On 12/6/2007 7:20 PM, oleg.rakhmanchik@gmail.com wrote:
>> Hi,
>>
>> I need help extracting urls from a large text file. I don't have
>> control over the format of the file so it is always different, but the
>> urls are always in .. tags. The text is always on the same
>> line without line breaks.
>>
>> asdfsdfwww.google.comdfgdgdgwww.yahoo.comasd
>> adfsdf sd sdgdfg...
>>
>> The surrounding text is always different and I need the quickest and
>> most efficient way to extract just the text between the 2 tags and
>> output it somewhere. Right now I do this with several commands and it
>> takes a while for a large file, but I know there is probably a quicker
>> and better way to do this. Please help.
>
> With GNU awk:
>
> gawk -F'' -v RS='' 'RT{print $NF}' file
>
> Ed.

With Perl:

perl -ne 'for (/(.*?)<\/url>/g) {print "$_\n"}' file

Regards,

Steffen "goedel" Schuler

Re: Help extracting strings via awk.

am 07.12.2007 16:35:34 von oleg.rakhmanchik

On Dec 6, 9:59 pm, Ed Morton wrote:
> On 12/6/2007 7:20 PM, oleg.rakhmanc...@gmail.com wrote:
>
> > Hi,
>
> > I need help extracting urls from a large text file. I don't have
> > control over the format of the file so it is always different, but the
> > urls are always in .. tags. The text is always on the same
> > line without line breaks.
>
> > asdfsdfwww.google.comdfgdgdgwww.yahoo.comasd
> > adfsdf sd sdgdfg...
>
> > The surrounding text is always different and I need the quickest and
> > most efficient way to extract just the text between the 2 tags and
> > output it somewhere. Right now I do this with several commands and it
> > takes a while for a large file, but I know there is probably a quicker
> > and better way to do this. Please help.
>
> With GNU awk:
>
> gawk -F'' -v RS='' 'RT{print $NF}' file
>
> Ed.

These work perfectly, thank you.

Re: Help extracting strings via awk.

am 07.12.2007 22:25:32 von krahnj

Steffen Schuler wrote:
>
> On Thu, 06 Dec 2007 20:59:01 -0600, Ed Morton wrote:
>
> > On 12/6/2007 7:20 PM, oleg.rakhmanchik@gmail.com wrote:
> >>
> >> I need help extracting urls from a large text file. I don't have
> >> control over the format of the file so it is always different, but the
> >> urls are always in .. tags. The text is always on the same
> >> line without line breaks.
> >>
> >> asdfsdfwww.google.comdfgdgdgwww.yahoo.comasd
> >> adfsdf sd sdgdfg...
> >>
> >> The surrounding text is always different and I need the quickest and
> >> most efficient way to extract just the text between the 2 tags and
> >> output it somewhere. Right now I do this with several commands and it
> >> takes a while for a large file, but I know there is probably a quicker
> >> and better way to do this. Please help.
> >
> > With GNU awk:
> >
> > gawk -F'' -v RS='' 'RT{print $NF}' file
>
> With Perl:
>
> perl -ne 'for (/(.*?)<\/url>/g) {print "$_\n"}' file

The Perl version of that gawk program would be:

perl -F'' -lane'BEGIN{$/=""} print $F[-1]' file


John
--
use Perl;
program
fulfillment