selecting text between two patterns

selecting text between two patterns

am 02.09.2007 17:54:58 von ginger.m.griffin

Hey... I'm trying to extract text between two patterns. Ideally,
if I had two patterns:
PATTERN1
PATTERN2
I'd like to get back the text that is between the first occurance of
those patterns, not including the patterns
For instance, if I had the text:

this is one line
this is PATTERN1 another line
hello
this is somethingPATTERN2 else
this is the last line

I would want this result

another line
hello
this is something

Or if I had this text:
this PATTERN1is one big line. But it isPATTERN2 not very long.

I would want this result:
is one big line. But it is


Thanks!

Re: selecting text between two patterns

am 02.09.2007 19:47:26 von cfajohnson

On 2007-09-02, gin_g wrote:
> Hey... I'm trying to extract text between two patterns. Ideally,
> if I had two patterns:
> PATTERN1
> PATTERN2
> I'd like to get back the text that is between the first occurance of
> those patterns, not including the patterns
> For instance, if I had the text:
>
> this is one line
> this is PATTERN1 another line
> hello
> this is somethingPATTERN2 else
> this is the last line
>
> I would want this result
>
> another line
> hello
> this is something
>
> Or if I had this text:
> this PATTERN1is one big line. But it isPATTERN2 not very long.
>
> I would want this result:
> is one big line. But it is

This may not work with very large files:

file=$( cat "$FILENAME" )
temp=${file#*PATTERN1}
printf "%s\n" "${temp%%PATTERN2*}"


--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: selecting text between two patterns

am 03.09.2007 00:46:38 von Dummy

gin_g wrote:
> Hey... I'm trying to extract text between two patterns. Ideally,
> if I had two patterns:
> PATTERN1
> PATTERN2
> I'd like to get back the text that is between the first occurance of
> those patterns, not including the patterns
> For instance, if I had the text:
>
> this is one line
> this is PATTERN1 another line
> hello
> this is somethingPATTERN2 else
> this is the last line
>
> I would want this result
>
> another line
> hello
> this is something

$ echo "this is one line
this is PATTERN1 another line
hello
this is somethingPATTERN2 else
this is the last line
" | perl -lne'print if s/.*?PATTERN1// .. s/(.*)PATTERN2.*/$1/'
another line
hello
this is something


> Or if I had this text:
> this PATTERN1is one big line. But it isPATTERN2 not very long.
>
> I would want this result:
> is one big line. But it is

$ echo "this PATTERN1is one big line. But it isPATTERN2 not very long.
" | perl -lne'print if s/.*?PATTERN1// .. s/(.*)PATTERN2.*/$1/'
is one big line. But it is




John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: selecting text between two patterns

am 03.09.2007 01:46:22 von William James

On Sep 2, 10:54 am, gin_g wrote:
> Hey... I'm trying to extract text between two patterns. Ideally,
> if I had two patterns:
> PATTERN1
> PATTERN2
> I'd like to get back the text that is between the first occurance of
> those patterns, not including the patterns
> For instance, if I had the text:
>
> this is one line
> this is PATTERN1 another line
> hello
> this is somethingPATTERN2 else
> this is the last line
>
> I would want this result
>
> another line
> hello
> this is something
>
> Or if I had this text:
> this PATTERN1is one big line. But it isPATTERN2 not very long.
>
> I would want this result:
> is one big line. But it is
>
> Thanks!

ruby -e 'puts gets(nil)[ /PATTERN1(.*)PATTERN2/m,1]' my_file

The first occurrence of PATTERN1 is used, and
the last of PATTERN2.

==== input ====
one
twoPATTERN1
three PATTERN1
four
fivePATTERN2
sixPATTERN2
seven

==== output ====

three PATTERN1
four
fivePATTERN2
six

Re: selecting text between two patterns

am 03.09.2007 03:28:31 von Ed Morton

gin_g wrote:
> Hey... I'm trying to extract text between two patterns. Ideally,
> if I had two patterns:
> PATTERN1
> PATTERN2
> I'd like to get back the text that is between the first occurance of
> those patterns, not including the patterns
> For instance, if I had the text:
>
> this is one line
> this is PATTERN1 another line
> hello
> this is somethingPATTERN2 else
> this is the last line
>
> I would want this result
>
> another line
> hello
> this is something
>
> Or if I had this text:
> this PATTERN1is one big line. But it isPATTERN2 not very long.
>
> I would want this result:
> is one big line. But it is
>
>
> Thanks!
>

With an awk that supports REs as Record Separators (e.g. gawk), it might
be just:

awk -v RS="PATTERN1|PATTERN2" 'NR==2' file

depending on whether or not PATTERN1 or PATTERN2 can occur before
PATTERN1 in your real input.

Regards,

Ed.

Re: selecting text between two patterns

am 06.09.2007 07:46:35 von onkelheinz

"Ed Morton" schrieb im Newsbeitrag
news:eoCdnXjGT6jc_kbbnZ2dnUVZ_oGjnZ2d@comcast.com...
>
> awk -v RS="PATTERN1|PATTERN2" 'NR==2' file
>
> depending on whether or not PATTERN1 or PATTERN2 can occur before PATTERN1
> in your real input.
>
> Regards,
>
> Ed.

And what is the meaning of 'NR==2' in that awk expression?

Regards,
Heinz

Re: selecting text between two patterns

am 06.09.2007 10:46:50 von Ed Morton

Heinz Müller wrote:
> "Ed Morton" schrieb im Newsbeitrag
> news:eoCdnXjGT6jc_kbbnZ2dnUVZ_oGjnZ2d@comcast.com...
>
>>awk -v RS="PATTERN1|PATTERN2" 'NR==2' file
>>
>>depending on whether or not PATTERN1 or PATTERN2 can occur before PATTERN1
>>in your real input.
>>
>>Regards,
>>
>>Ed.
>
>
> And what is the meaning of 'NR==2' in that awk expression?

It's a test for the second record in the file, so if you have a file like:

abc
PATTERN1
def
ghi
PATTERN2
klm

then since I'm specifying that the Record Separator (RS) is the RE
"PATTERN1 or PATTERN2" the first record will be

abc

and the second will be:

def
ghi

and the third will be:

klm

So, by testing for the Number of Records (NR) equal to 2, I'm selecting
that second record. Since I don't specify any action to take when that
condition (NR==2) is true, awk uses the default action which is to just
print the record for which that condition is true.

Ed.

Re: selecting text between two patterns

am 06.09.2007 21:56:18 von onkelheinz

"Ed Morton" schrieb im Newsbeitrag
news:_YudnXwzhO_nI0LbnZ2dnUVZ_oSnnZ2d@comcast.com...
>
> It's a test for the second record in the file, so if you have a file like:
>
> abc
> PATTERN1
> def
> ghi
> PATTERN2
> klm
>
> then since I'm specifying that the Record Separator (RS) is the RE
> "PATTERN1 or PATTERN2" the first record will be
>
> abc
>
> and the second will be:
>
> def
> ghi
>
> and the third will be:
>
> klm
>
> So, by testing for the Number of Records (NR) equal to 2, I'm selecting
> that second record. Since I don't specify any action to take when that
> condition (NR==2) is true, awk uses the default action which is to just
> print the record for which that condition is true.
>
> Ed.

Thank you for the detailed eplanation!!

Re: selecting text between two patterns

am 07.09.2007 22:10:05 von William Park

gin_g wrote:
> Hey... I'm trying to extract text between two patterns. Ideally,
> if I had two patterns:
> PATTERN1
> PATTERN2
> I'd like to get back the text that is between the first occurance of
> those patterns, not including the patterns
> For instance, if I had the text:
>
> this is one line
> this is PATTERN1 another line
> hello
> this is somethingPATTERN2 else
> this is the last line
>
> I would want this result
>
> another line
> hello
> this is something
>
> Or if I had this text:
> this PATTERN1is one big line. But it isPATTERN2 not very long.
>
> I would want this result:
> is one big line. But it is

If you're brave, you can try my Bash extension:
http://freshmeat.net/projects/bashdiff/
http://home.eol.ca/~parkw/index.html#strinterval

strinterval [-r] string begin end [submatch]
Extract substring, delimited by non-overlapping BEGIN and END patterns. By
default, the patterns are simple string, but regex(7) pattern can be used
with -r option. Returns success (0) if patterns are found, or failure (1)
if patterns are not found. When patterns are found, there are 5 segments
to consider:
string --> submatch=( prefix BEGIN middle END suffix )
All 5 segments are returned in array variable SUBMATCH (if given) which is
always flushed first.

Eg.
a='this PATTERN1is one big line. But it isPATTERN2 not very long'
strinterval "$a" "PATTERN1" "PATTERN2" submatch
declare -p submatch

==> submatch='([0]="this " [1]="PATTERN1" [2]="is one big line. But it is"
[3]="PATTERN2" [4]=" not very long")'

So, you want
echo "${submatch[2]}"

--
William Park , Toronto, Canada
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/