Search for and display multiple groups of entries

am 14.09.2007 12:07:35 von cdmonline

I have a config file that contains multiple blocks of config
information and I want to split each block out into its own file.
Here's an example of what I have to work with:

some-text-hereXXX::
{ bla bla bla
bla bla bla
bla bla bla
}
some-more-text-here::
{ yada yada yada
bla bla bla bla
}

some-text-here-again XXX::
{ text text text
text here
could be anything here
}

Each unique block starts with a line that terminates with a double
colon (::).
Each unique block ends with a line that terminates with a close brace
(}).
There may or may not be spaces or tabs preceeding each line.
The information within each block is always different.
There may or may not be blank lines spearating each block.

Additionally, I'd like to be able to display all blocks that match a
given regexp on the first line of the block. In other words, given the
above outpout, if I wanted to show all blocks that matched XXX, then I
would want the result to be as follows:

some-text-hereXXX::
{ bla bla bla
bla bla bla
bla bla bla
}

some-text-here-again XXX::
{ text text text
text here
could be anything here
}

Ideally, I want to be able to do all of this in shell (ksh is my
preference).

- CDM

Re: Search for and display multiple groups of entries

am 14.09.2007 13:41:25 von Ed Morton

cdmonline@mac.com wrote:
> I have a config file that contains multiple blocks of config
> information and I want to split each block out into its own file.
> Here's an example of what I have to work with:
>
> some-text-hereXXX::
> { bla bla bla
> bla bla bla
> bla bla bla
> }
> some-more-text-here::
> { yada yada yada
> bla bla bla bla
> }
>
> some-text-here-again XXX::
> { text text text
> text here
> could be anything here
> }
>
> Each unique block starts with a line that terminates with a double
> colon (::).
> Each unique block ends with a line that terminates with a close brace
> (}).
> There may or may not be spaces or tabs preceeding each line.
> The information within each block is always different.
> There may or may not be blank lines spearating each block.

awk '/::$/{++i;f=1} f{print > "file"i} /}$/{f=0}' file

> Additionally, I'd like to be able to display all blocks that match a
> given regexp on the first line of the block. In other words, given the
> above outpout, if I wanted to show all blocks that matched XXX, then I
> would want the result to be as follows:
>
> some-text-hereXXX::
> { bla bla bla
> bla bla bla
> bla bla bla
> }
>
> some-text-here-again XXX::
> { text text text
> text here
> could be anything here
> }

awk '/::$/&&/XXX/,/}$/' file

> Ideally, I want to be able to do all of this in shell (ksh is my
> preference).

shell is not a text-processing tool, awk is.

Ed.

Re: Search for and display multiple groups of entries

am 14.09.2007 13:42:10 von Janis Papanagnou

On 14 Sep., 12:07, cdmonl...@mac.com wrote:
> I have a config file that contains multiple blocks of config
> information and I want to split each block out into its own file.
> Here's an example of what I have to work with:
>
> some-text-hereXXX::
> { bla bla bla
> bla bla bla
> bla bla bla}
>
> some-more-text-here::
> { yada yada yada
> bla bla bla bla
>
> }
>
> some-text-here-again XXX::
> { text text text
> text here
> could be anything here
> }

To split the blocks in files f1, f2, ...

awk '/::$/{c++} /::$/,/}$/{print >"f"c}'

> Each unique block starts with a line that terminates with a double
> colon (::).
> Each unique block ends with a line that terminates with a close brace
> (}).
> There may or may not be spaces or tabs preceeding each line.
> The information within each block is always different.
> There may or may not be blank lines spearating each block.
>
> Additionally, I'd like to be able to display all blocks that match a
> given regexp on the first line of the block. In other words, given the
> above outpout, if I wanted to show all blocks that matched XXX, then I
> would want the result to be as follows:

awk -v pat="XXX" 'BEGIN{FS="\n";RS="";m=pat"::"} $1~m'

If you have multiple block matches and want them separated by an empty
line then add OFS="\n\n"; to the BEGIN block.

Janis

> some-text-hereXXX::
> { bla bla bla
> bla bla bla
> bla bla bla
>
> }
>
> some-text-here-again XXX::
> { text text text
> text here
> could be anything here
> }
>
> Ideally, I want to be able to do all of this in shell (ksh is my
> preference).
>
> - CDM

Re: Search for and display multiple groups of entries

am 14.09.2007 13:59:45 von cdmonline

Some nice tricks there. Thanks!

Just to move the goal posts a tad, it turns out that my config file
ALSO contains blocks that are NOT encapsulated with the braces. So a
better example of my input might look like this:

some-text-hereXXX::
{ bla bla bla
bla bla bla
bla bla bla
}

some-more-text-here::
{ yada yada yada
bla bla bla bla

}

here-is-some-text::
zvsdvb sdfbsdfbsdf sfbfsb bfs
sgbsfb gsbfgb fgbf bfg fgfb

some-text-here-again XXX::
{ text text text
text here
could be anything here
}

here-is-(!XXX)more-text::
sdfsd sfdbsfdbsb sbfgsfdb sfdbfbfs

Given this additional complication, how do I now go about displaying
all the blocks that match the regexp I'm looking for?

Re: Search for and display multiple groups of entries

am 14.09.2007 14:11:29 von cdmonline

> awk '/::$/&&/XXX/,/}$/' file

This only prints the heading line and not the contents of the block
itself.

> > Ideally, I want to be able to do all of this in shell (ksh is my
> > preference).
>
> shell is not a text-processing tool, awk is.

Good point. By shell I was trying to imply sed, awk, grep, etc., as
opposed to a perl solution.

Re: Search for and display multiple groups of entries

am 14.09.2007 14:15:17 von Ed Morton

cdmonline@mac.com wrote:
> Some nice tricks there. Thanks!

This is usenet, not a web forum. Please provide enough quoted context in
future so your post stands alone and we can see what you're responding to.

> Just to move the goal posts a tad, it turns out that my config file
> ALSO contains blocks that are NOT encapsulated with the braces. So a
> better example of my input might look like this:
>
> some-text-hereXXX::
> { bla bla bla
> bla bla bla
> bla bla bla
> }
>
> some-more-text-here::
> { yada yada yada
> bla bla bla bla
>
> }
>
> here-is-some-text::
> zvsdvb sdfbsdfbsdf sfbfsb bfs
> sgbsfb gsbfgb fgbf bfg fgfb
>
> some-text-here-again XXX::
> { text text text
> text here
> could be anything here
> }
>
> here-is-(!XXX)more-text::
> sdfsd sfdbsfdbsb sbfgsfdb sfdbfbfs
>
> Given this additional complication, how do I now go about displaying
> all the blocks that match the regexp I'm looking for?
>

So really, you're saying a block ends when the next block starts? If so,
then this is how you split it into separate files:

awk '/::$/{++i;f=1} f{print > "file"i}' file

and this is how you select specific blocks:

awk '/::$/{f=(/XXX/?1:0)}f' file

Regards,

Ed.

Re: Search for and display multiple groups of entries

am 14.09.2007 14:16:19 von Ed Morton

cdmonline@mac.com wrote:

>>awk '/::$/&&/XXX/,/}$/' file
>
>
> This only prints the heading line and not the contents of the block
> itself.

No, it prints the whole block.

Ed.

Re: Search for and display multiple groups of entries

am 14.09.2007 15:11:28 von cdmonline

On Sep 14, 1:15 pm, Ed Morton wrote:
> So really, you're saying a block ends when the next block starts? If so,
> then this is how you split it into separate files:
>
> awk '/::$/{++i;f=1} f{print > "file"i}' file
>
> and this is how you select specific blocks:
>
> awk '/::$/{f=(/XXX/?1:0)}f' file
>
> Regards,
>
> Ed.

That's exactly what I was looking for. Thanks!

OK, one last move of the goal post. I want my regexp to be narrowed
down to being a word. In this context, a word is bound by a white
space, end of line, colon, period, exclamation point or parenthesis.
In other words, if my search string is XXX, then I want to match the
following:

XXX
(XXX|123)
..XXX
XXX!abc
XXX:
abc.!XXX:

.... but NOT any of the following:

123XXX
XXXabc
qweXXXrty

I know that the awk regexp follows slightly different rules to the
shell regexp and this level of awk is above my station.

Re: Search for and display multiple groups of entries

am 14.09.2007 15:38:20 von Ed Morton

cdmonline@mac.com wrote:
> On Sep 14, 1:15 pm, Ed Morton wrote:
>
>>So really, you're saying a block ends when the next block starts? If so,
>>then this is how you split it into separate files:
>>
>> awk '/::$/{++i;f=1} f{print > "file"i}' file
>>
>>and this is how you select specific blocks:
>>
>> awk '/::$/{f=(/XXX/?1:0)}f' file
>>
>>Regards,
>>
>> Ed.
>
>
> That's exactly what I was looking for. Thanks!
>
> OK, one last move of the goal post. I want my regexp to be narrowed
> down to being a word. In this context, a word is bound by a white
> space, end of line, colon, period, exclamation point or parenthesis.

If you're using GNU awk, you can use "\<" and "\>" to delimit words, e.g.:

/\/

For other awks you need to explicitly list the possibilitites, e.g.:

/(^|[[:space:]:.!\(])word($|[[:space:]:.!\(])/

There MAY be a specific character class to make things simpler, see:

http://www.gnu.org/software/gawk/manual/gawk.html#table_002d char_002dclasses

> In other words, if my search string is XXX, then I want to match the
> following:
>
> XXX
> (XXX|123)
> .XXX
> XXX!abc
> XXX:
> abc.!XXX:
>
> ... but NOT any of the following:
>
> 123XXX
> XXXabc
> qweXXXrty
>
> I know that the awk regexp follows slightly different rules to the
> shell regexp and this level of awk is above my station.
>

awk uses Extended REs, some other tools, like sed, use Basic REs.
"shell" uses different pattern matching methods depending on your shell
and context.

Ed.

Re: Search for and display multiple groups of entries

am 15.09.2007 03:02:37 von William James

On Sep 14, 8:11 am, cdmonl...@mac.com wrote:
> On Sep 14, 1:15 pm, Ed Morton wrote:
>
> > So really, you're saying a block ends when the next block starts? If so,
> > then this is how you split it into separate files:
>
> > awk '/::$/{++i;f=1} f{print > "file"i}' file
>
> > and this is how you select specific blocks:
>
> > awk '/::$/{f=(/XXX/?1:0)}f' file
>
> > Regards,
>
> > Ed.
>
> That's exactly what I was looking for. Thanks!
>
> OK, one last move of the goal post. I want my regexp to be narrowed
> down to being a word. In this context, a word is bound by a white
> space, end of line, colon, period, exclamation point or parenthesis.
> In other words, if my search string is XXX, then I want to match the
> following:
>
> XXX
> (XXX|123)
> .XXX
> XXX!abc
> XXX:
> abc.!XXX:
>
> ... but NOT any of the following:
>
> 123XXX
> XXXabc
> qweXXXrty
>
> I know that the awk regexp follows slightly different rules to the
> shell regexp and this level of awk is above my station.

awk '/::$/ {f=/(^|[^a-zA-Z0-9])XXX[^a-zA-Z0-9]/} f' myfile

gawk '/::$/ {f=/\/} f' myfile

gawk 'ort ~ /\/{print ort,$0}{ort=RT}' RS='[^\n]*::' file