Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 30.10.2007 22:31:08 von dba user

Dear Group,

I have a file with the content in the following format:

Junk...
Junk...

Heading P01
column1 column2 multiline text
CA1001 10 This is a multiline
text spanning two lines

CA1005 12 This is a multiline
text spanning three
lines

CA1008 11 This is a single line text

Heading P02
column1 column2
CA2001 10
CA2003 11
CA2005 12

Heading P03
Junk..
Junk..

I would like to list all the values under "Heading P01" for the same
column1 in a single line

CA1001 10 This is a multiline text spanning two lines
CA1005 12 This is a multiline text spanning three lines
CA1008 11 This is a single line text

Note: The column1 values will always have "CA" as the starting
character.

Appreciate your help in finding a solution using awk or perl or
sed ...

Thank you!!!!

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 30.10.2007 22:54:04 von Cyrus Kriticos

da. Ram wrote:
>
> I have a file with the content in the following format:
>
> Junk...
> Junk...
>
> Heading P01
> column1 column2 multiline text
> CA1001 10 This is a multiline
> text spanning two lines
>
> CA1005 12 This is a multiline
> text spanning three
> lines
>
> CA1008 11 This is a single line text
>
> Heading P02
> column1 column2
> CA2001 10
> CA2003 11
> CA2005 12
>
> Heading P03
> Junk..
> Junk..
>
> I would like to list all the values under "Heading P01" for the same
> column1 in a single line
>
> CA1001 10 This is a multiline text spanning two lines
> CA1005 12 This is a multiline text spanning three lines
> CA1008 11 This is a single line text
>
> Note: The column1 values will always have "CA" as the starting
> character.
>
> Appreciate your help in finding a solution using awk or perl or
> sed ...

[GNU sed]

Something like this?

$ sed -n "/^CA/{:X;N;s/\n//;/^$/bX;p}" file.txt
CA1001 10 This is a multiline text spanning two lines
CA1005 12 This is a multiline text spanning three
CA1008 11 This is a single line text
CA2001 10CA2003 11
CA2005 12

--
Best regards | Be nice to America or they'll bring democracy to
Cyrus | your country.

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 30.10.2007 23:19:12 von Michael Tosch

da. Ram wrote:
> Dear Group,
>
> I have a file with the content in the following format:
>
> Junk...
> Junk...
>
> Heading P01
> column1 column2 multiline text
> CA1001 10 This is a multiline
> text spanning two lines
>
> CA1005 12 This is a multiline
> text spanning three
> lines
>
> CA1008 11 This is a single line text
>
> Heading P02
> column1 column2
> CA2001 10
> CA2003 11
> CA2005 12
>
> Heading P03
> Junk..
> Junk..
>
> I would like to list all the values under "Heading P01" for the same
> column1 in a single line
>
> CA1001 10 This is a multiline text spanning two lines
> CA1005 12 This is a multiline text spanning three lines
> CA1008 11 This is a single line text
>
> Note: The column1 values will always have "CA" as the starting
> character.
>
> Appreciate your help in finding a solution using awk or perl or
> sed ...
>
> Thank you!!!!
>

awk '/^Heading P01/{x=1} /^Heading P02/{x=0} x==0{next}
/^CA/,/^$/{printf "%s",$0}/^$/{print}' file


--
Michael Tosch @ hp : com

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 30.10.2007 23:59:05 von dba user

On Oct 30, 3:19 pm, Michael Tosch
wrote:
> da. Ram wrote:
> > Dear Group,
>
> > I have a file with the content in the following format:
>
> > Junk...
> > Junk...
>
> > Heading P01
> > column1 column2 multiline text
> > CA1001 10 This is a multiline
> > text spanning two lines
>
> > CA1005 12 This is a multiline
> > text spanning three
> > lines
>
> > CA1008 11 This is a single line text
>
> > Heading P02
> > column1 column2
> > CA2001 10
> > CA2003 11
> > CA2005 12
>
> > Heading P03
> > Junk..
> > Junk..
>
> > I would like to list all the values under "Heading P01" for the same
> > column1 in a single line
>
> > CA1001 10 This is a multiline text spanning two lines
> > CA1005 12 This is a multiline text spanning three lines
> > CA1008 11 This is a single line text
>
> > Note: The column1 values will always have "CA" as the starting
> > character.
>
> > Appreciate your help in finding a solution using awk or perl or
> > sed ...
>
> > Thank you!!!!
>
> awk '/^Heading P01/{x=1} /^Heading P02/{x=0} x==0{next}
> /^CA/,/^$/{printf "%s",$0}/^$/{print}' file
>
> --
> Michael Tosch @ hp : com


Thanks so much for the neat solution. Would it be possible to add the
heading ID to the combined line?

I tried the following, but the heading is getting added not just at
the begining but for every section of the broken line.

I am trying to figure out a way to get the heading id added once per
combined line

awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}/^CA/,/^$/
{printf " %s %s",p,$0}/^$/{print}' file

P01 CA1001 10 This is a multiline P01 text spanning
two lines P01
P01 CA1005 12 This is a multiline P01 text spanning
three P01 lines P01
P01 CA1008 11 This is a single line text P01

Desired output

P01 CA1001 10 This is a multiline text spanning two
lines
P01 CA1005 12 This is a multiline text spanning
three lines
P01 CA1008 11 This is a single line text

BTW, what does the "print" at the end of the command do?

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 31.10.2007 00:42:35 von Michael Tosch

da. Ram wrote:
> On Oct 30, 3:19 pm, Michael Tosch
> wrote:
>> da. Ram wrote:
>>> Dear Group,
>>> I have a file with the content in the following format:
>>> Junk...
>>> Junk...
>>> Heading P01
>>> column1 column2 multiline text
>>> CA1001 10 This is a multiline
>>> text spanning two lines
>>> CA1005 12 This is a multiline
>>> text spanning three
>>> lines
>>> CA1008 11 This is a single line text
>>> Heading P02
>>> column1 column2
>>> CA2001 10
>>> CA2003 11
>>> CA2005 12
>>> Heading P03
>>> Junk..
>>> Junk..
>>> I would like to list all the values under "Heading P01" for the same
>>> column1 in a single line
>>> CA1001 10 This is a multiline text spanning two lines
>>> CA1005 12 This is a multiline text spanning three lines
>>> CA1008 11 This is a single line text
>>> Note: The column1 values will always have "CA" as the starting
>>> character.
>>> Appreciate your help in finding a solution using awk or perl or
>>> sed ...
>>> Thank you!!!!
>> awk '/^Heading P01/{x=1} /^Heading P02/{x=0} x==0{next}
>> /^CA/,/^$/{printf "%s",$0}/^$/{print}' file
>>
>> --
>> Michael Tosch @ hp : com
>
>
> Thanks so much for the neat solution. Would it be possible to add the
> heading ID to the combined line?
>
> I tried the following, but the heading is getting added not just at
> the begining but for every section of the broken line.
>
> I am trying to figure out a way to get the heading id added once per
> combined line
>
> awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}/^CA/,/^$/
> {printf " %s %s",p,$0}/^$/{print}' file
>
> P01 CA1001 10 This is a multiline P01 text spanning
> two lines P01
> P01 CA1005 12 This is a multiline P01 text spanning
> three P01 lines P01
> P01 CA1008 11 This is a single line text P01
>
> Desired output
>
> P01 CA1001 10 This is a multiline text spanning two
> lines
> P01 CA1005 12 This is a multiline text spanning
> three lines
> P01 CA1008 11 This is a single line text
>
> BTW, what does the "print" at the end of the command do?
>

awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}
/^CA/{printf " %s ",p} /^CA/,/^$/{printf "%s",$0} /^$/{print}' file

The print at the end prints a newline character.
(More precise: it prints the current line with a newline, but the
current line is empty).

printf "%s" prints without a newline.

--
Michael Tosch @ hp : com

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 31.10.2007 01:19:45 von dba user

On Oct 30, 4:42 pm, Michael Tosch
wrote:
> da. Ram wrote:
> > On Oct 30, 3:19 pm, Michael Tosch
> > wrote:
> >> da. Ram wrote:
> >>> Dear Group,
> >>> I have a file with the content in the following format:
> >>> Junk...
> >>> Junk...
> >>> Heading P01
> >>> column1 column2 multiline text
> >>> CA1001 10 This is a multiline
> >>> text spanning two lines
> >>> CA1005 12 This is a multiline
> >>> text spanning three
> >>> lines
> >>> CA1008 11 This is a single line text
> >>> Heading P02
> >>> column1 column2
> >>> CA2001 10
> >>> CA2003 11
> >>> CA2005 12
> >>> Heading P03
> >>> Junk..
> >>> Junk..
> >>> I would like to list all the values under "Heading P01" for the same
> >>> column1 in a single line
> >>> CA1001 10 This is a multiline text spanning two lines
> >>> CA1005 12 This is a multiline text spanning three lines
> >>> CA1008 11 This is a single line text
> >>> Note: The column1 values will always have "CA" as the starting
> >>> character.
> >>> Appreciate your help in finding a solution using awk or perl or
> >>> sed ...
> >>> Thank you!!!!
> >> awk '/^Heading P01/{x=1} /^Heading P02/{x=0} x==0{next}
> >> /^CA/,/^$/{printf "%s",$0}/^$/{print}' file
>
> >> --
> >> Michael Tosch @ hp : com
>
> > Thanks so much for the neat solution. Would it be possible to add the
> > heading ID to the combined line?
>
> > I tried the following, but the heading is getting added not just at
> > the begining but for every section of the broken line.
>
> > I am trying to figure out a way to get the heading id added once per
> > combined line
>
> > awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}/^CA/,/^$/
> > {printf " %s %s",p,$0}/^$/{print}' file
>
> > P01 CA1001 10 This is a multiline P01 text spanning
> > two lines P01
> > P01 CA1005 12 This is a multiline P01 text spanning
> > three P01 lines P01
> > P01 CA1008 11 This is a single line text P01
>
> > Desired output
>
> > P01 CA1001 10 This is a multiline text spanning two
> > lines
> > P01 CA1005 12 This is a multiline text spanning
> > three lines
> > P01 CA1008 11 This is a single line text
>
> > BTW, what does the "print" at the end of the command do?
>
> awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}
> /^CA/{printf " %s ",p} /^CA/,/^$/{printf "%s",$0} /^$/{print}' file
>
> The print at the end prints a newline character.
> (More precise: it prints the current line with a newline, but the
> current line is empty).
>
> printf "%s" prints without a newline.
>
> --
> Michael Tosch @ hp : com

Thanks so much! The solution works great.

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 31.10.2007 02:39:18 von krahnj

"da. Ram" wrote:
>
> Dear Group,
>
> I have a file with the content in the following format:
>
> Junk...
> Junk...
>
> Heading P01
> column1 column2 multiline text
> CA1001 10 This is a multiline
> text spanning two lines
>
> CA1005 12 This is a multiline
> text spanning three
> lines
>
> CA1008 11 This is a single line text
>
> Heading P02
> column1 column2
> CA2001 10
> CA2003 11
> CA2005 12
>
> Heading P03
> Junk..
> Junk..
>
> I would like to list all the values under "Heading P01" for the same
> column1 in a single line
>
> CA1001 10 This is a multiline text spanning two lines
> CA1005 12 This is a multiline text spanning three lines
> CA1008 11 This is a single line text
>
> Note: The column1 values will always have "CA" as the starting
> character.
>
> Appreciate your help in finding a solution using awk or perl or
> sed ...

$ echo "Junk...
Junk...

Heading P01
column1 column2 multiline text
CA1001 10 This is a multiline
text spanning two lines

CA1005 12 This is a multiline
text spanning three
lines

CA1008 11 This is a single line text

Heading P02
column1 column2
CA2001 10
CA2003 11
CA2005 12

Heading P03
Junk..
Junk..
" | perl -ln00e's/^.+\n(?=CA)//s,y/\n//d,print,if/Heading P01/../Heading
P02/and!/Heading P02/'
CA1001 10 This is a multiline text spanning two lines
CA1005 12 This is a multiline text spanning three lines
CA1008 11 This is a single line text




John
--
use Perl;
program
fulfillment

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 31.10.2007 14:53:42 von Miguel Lobos

I'm looking to do something similar, but I'm not up to snuff enough on
awk or perl to figure it out on my own yet.

Anyway, I'm looking to match a particular string in a line, then grab
this line
and the next 5 after it so I can parse out some other parameters.
Here's a sample of what
I need to pull out of the file (matching on 'GHLR665', then pulling
this line plus the next 5):

1193616363 XXXXXX46D00 CM GHLR665 OCT28 18:59:54 7090 INFO
Table GHLRVLR Resource Limitation
1193616363 Operation: Update Location
1193616363 VLR number: 551178313920
1193616363 Description: Table GHLRVLR is about to
reach its 6000 Maximum Capacity.
1193616363 Space Left: 0 (6000)
1193616363 Action: Use QVLRACT in HLRADMIN to
identify the inactive VLRs.

Every line in this log file starts with a 10 digit number (i.e.
1193616363), which may or may not be the same value.
The line before and after what I'm trying to capture and write into a
single line / record will be just the 10 digit number,
followed by some white space character and a carriage return (UNIX
style, I think).

Any suggestions would be very much appreciated!

Mike

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 31.10.2007 15:34:25 von Ed Morton

On 10/31/2007 8:53 AM, Miguel Lobos wrote:
> I'm looking to do something similar, but I'm not up to snuff enough on
> awk or perl to figure it out on my own yet.
>
> Anyway, I'm looking to match a particular string in a line, then grab
> this line
> and the next 5 after it so I can parse out some other parameters.
> Here's a sample of what
> I need to pull out of the file (matching on 'GHLR665', then pulling
> this line plus the next 5):
>
> 1193616363 XXXXXX46D00 CM GHLR665 OCT28 18:59:54 7090 INFO
> Table GHLRVLR Resource Limitation
> 1193616363 Operation: Update Location
> 1193616363 VLR number: 551178313920
> 1193616363 Description: Table GHLRVLR is about to
> reach its 6000 Maximum Capacity.
> 1193616363 Space Left: 0 (6000)
> 1193616363 Action: Use QVLRACT in HLRADMIN to
> identify the inactive VLRs.
>
> Every line in this log file starts with a 10 digit number (i.e.
> 1193616363), which may or may not be the same value.
> The line before and after what I'm trying to capture and write into a
> single line / record will be just the 10 digit number,
> followed by some white space character and a carriage return (UNIX
> style, I think).
>
> Any suggestions would be very much appreciated!
>
> Mike
>

The general mechanism to pull out N lines starting at a pattern is:

awk '/pattern/{c=N}c&&c--' file

so, if you want the line containing GHLR665 plus the subsequent 5 lines, you'd do:

awk '/GHLR665/{c=6}c&&c--' file

If you want to look for your pattern in a specific field (e.g. it's in the 4th
field in your sample input) then you'd do:

awk '$4 ~ /GHLR665/{c=6}c&&c--' file
or
awk '$4 == "GHLR665"{c=6}c&&c--' file

if you want an exact string comparison rather than an RE comparison.

Regards,

Ed.

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 31.10.2007 16:46:08 von Miguel Lobos

Ed,

Thanks! I'm running into some silly syntax errors, but thanks for the
explanation of the logic, that should get me going the right
direction.

Thanks Again,

Mike

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 31.10.2007 18:28:54 von Michael Tosch

Miguel Lobos wrote:
> Ed,
>
> Thanks! I'm running into some silly syntax errors, but thanks for the
> explanation of the logic, that should get me going the right
> direction.
>
> Thanks Again,
>
> Mike
>

old awks need

awk '/pattern/{c=N} c>0&&c-->0' file
or
awk '/pattern/{c=N} c>0{c--;print}' file


--
Michael Tosch @ hp : com

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 01.11.2007 02:54:27 von Miguel Lobos

Michael,

Thanks, you saved me some time and a little banging my head against
the cubicle wall. Apparently the version of awk in Solaris 10 is an
'old' awk -- the last of you examples is the one that did the trick.

Regards and Thank You Again,

Mike

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 01.11.2007 07:59:21 von Ed Morton

On 10/31/2007 8:54 PM, Miguel Lobos wrote:
> Michael,
>
> Thanks, you saved me some time and a little banging my head against
> the cubicle wall. Apparently the version of awk in Solaris 10 is an
> 'old' awk -- the last of you examples is the one that did the trick.
>
> Regards and Thank You Again,
>
> Mike
>

Absolutely do not do anything to accomodate old, broken awk on Solaris. Use GNU
awk (gawk), New awk (nawk), or /usr/xpg4/bin/awk instead.

By the way, this is netnews not a web forum so you should leave enough context
in each post so it stands alone.

Ed

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 02.11.2007 01:07:18 von Miguel Lobos

On Nov 1, 2:59 am, Ed Morton wrote:
> On 10/31/2007 8:54 PM, Miguel Lobos wrote:
>
> > Michael,
>
> > Thanks, you saved me some time and a little banging my head against
> > the cubicle wall. Apparently the version of awk in Solaris 10 is an
> > 'old' awk -- the last of you examples is the one that did the trick.
>
> > Regards and Thank You Again,
>
> > Mike
>
> Absolutely do not do anything to accomodate old, broken awk on Solaris. Use GNU
> awk (gawk), New awk (nawk), or /usr/xpg4/bin/awk instead.
>
> By the way, this is netnews not a web forum so you should leave enough context
> in each post so it stands alone.
>
> Ed

Ed,

Thank you again for the advice, and all points taken! Now that I've
managed to finish my report, I'll work on getting a more modern awk on
my Ultra 45.

Mike

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 02.11.2007 15:41:47 von Michael Tosch

Miguel Lobos wrote:
> On Nov 1, 2:59 am, Ed Morton wrote:
>> On 10/31/2007 8:54 PM, Miguel Lobos wrote:
>>
>>> Michael,
>>> Thanks, you saved me some time and a little banging my head against
>>> the cubicle wall. Apparently the version of awk in Solaris 10 is an
>>> 'old' awk -- the last of you examples is the one that did the trick.
>>> Regards and Thank You Again,
>>> Mike
>> Absolutely do not do anything to accomodate old, broken awk on Solaris. Use GNU
>> awk (gawk), New awk (nawk), or /usr/xpg4/bin/awk instead.
>>
>> By the way, this is netnews not a web forum so you should leave enough context
>> in each post so it stands alone.
>>
>> Ed
>
> Ed,
>
> Thank you again for the advice, and all points taken! Now that I've
> managed to finish my report, I'll work on getting a more modern awk on
> my Ultra 45.
>
> Mike
>

cd /usr/bin
ls -li awk nawk oawk

shows that awk linked to oawk.

rm awk
ln nawk awk

and it will be linked to nawk.
This is the path that AT&T had prepared
but Sun has never dared to go.

We all should open service cases with Sun and urge them for an RFE.


--
Michael Tosch @ hp : com

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 04.11.2007 16:26:23 von Miguel Lobos

On Nov 2, 10:41 am, Michael Tosch
wrote:
> Miguel Lobos wrote:
> > On Nov 1, 2:59 am, Ed Morton wrote:
> >> On 10/31/2007 8:54 PM, Miguel Lobos wrote:
>
> >>> Michael,
> >>> Thanks, you saved me some time and a little banging my head against
> >>> the cubicle wall. Apparently the version of awk in Solaris 10 is an
> >>> 'old' awk -- the last of you examples is the one that did the trick.
> >>> Regards and Thank You Again,
> >>> Mike
> >> Absolutely do not do anything to accomodate old, broken awk on Solaris. Use GNU
> >> awk (gawk), New awk (nawk), or /usr/xpg4/bin/awk instead.
>
> >> By the way, this is netnews not a web forum so you should leave enough context
> >> in each post so it stands alone.
>
> >> Ed
>
> > Ed,
>
> > Thank you again for the advice, and all points taken! Now that I've
> > managed to finish my report, I'll work on getting a more modern awk on
> > my Ultra 45.
>
> > Mike
>
> cd /usr/bin
> ls -li awk nawk oawk
>
> shows that awk linked to oawk.
>
> rm awk
> ln nawk awk
>
> and it will be linked to nawk.
> This is the path that AT&T had prepared
> but Sun has never dared to go.
>
> We all should open service cases with Sun and urge them for an RFE.
>
> --
> Michael Tosch @ hp : com- Hide quoted text -
>
> - Show quoted text -

Michael,

Excellent! I'll be updating my system on Monday morning, though I've
considered going and grabbing gawk off of sunfreeware.com. Just to
have something to fall back on, I'm probably going to rename rather
than remove the original awk to something else. Its not that I'm
afraid, but want to have a safety net if something else I was doing
with the original awk breaks, until I get time to figure out how to
make it work with nawk or gawk.

Thanks again for all the wonderful suggestions, and helping me get on
the right track with this.

Regards,

Mike

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 05.11.2007 09:28:20 von Michael Tosch

Miguel Lobos wrote:
> On Nov 2, 10:41 am, Michael Tosch
> wrote:
>> Miguel Lobos wrote:
>>> On Nov 1, 2:59 am, Ed Morton wrote:
>>>> On 10/31/2007 8:54 PM, Miguel Lobos wrote:
>>>>> Michael,
>>>>> Thanks, you saved me some time and a little banging my head against
>>>>> the cubicle wall. Apparently the version of awk in Solaris 10 is an
>>>>> 'old' awk -- the last of you examples is the one that did the trick.
>>>>> Regards and Thank You Again,
>>>>> Mike
>>>> Absolutely do not do anything to accomodate old, broken awk on Solaris. Use GNU
>>>> awk (gawk), New awk (nawk), or /usr/xpg4/bin/awk instead.
>>>> By the way, this is netnews not a web forum so you should leave enough context
>>>> in each post so it stands alone.
>>>> Ed
>>> Ed,
>>> Thank you again for the advice, and all points taken! Now that I've
>>> managed to finish my report, I'll work on getting a more modern awk on
>>> my Ultra 45.
>>> Mike
>> cd /usr/bin
>> ls -li awk nawk oawk
>>
>> shows that awk linked to oawk.
>>
>> rm awk
>> ln nawk awk
>>
>> and it will be linked to nawk.
>> This is the path that AT&T had prepared
>> but Sun has never dared to go.
>>
>> We all should open service cases with Sun and urge them for an RFE.
>>
>> --
>> Michael Tosch @ hp : com- Hide quoted text -
>>
>> - Show quoted text -
>
> Michael,
>
> Excellent! I'll be updating my system on Monday morning, though I've
> considered going and grabbing gawk off of sunfreeware.com. Just to
> have something to fall back on, I'm probably going to rename rather
> than remove the original awk to something else. Its not that I'm
> afraid, but want to have a safety net if something else I was doing
> with the original awk breaks, until I get time to figure out how to
> make it work with nawk or gawk.
>
> Thanks again for all the wonderful suggestions, and helping me get on
> the right track with this.
>
> Regards,
>
> Mike
>

Hmm, oawk should be backup enough.
But it is maybe wise to symlink awk to nawk, so an awk patch would replace
the awk symlink but not spoil nawk.

--
Michael Tosch @ hp : com

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 05.11.2007 20:16:55 von dba user

On Oct 30, 4:42 pm, Michael Tosch
wrote:
> da. Ram wrote:
> > On Oct 30, 3:19 pm, Michael Tosch
> > wrote:
> >> da. Ram wrote:
> >>> Dear Group,
> >>> I have a file with the content in the following format:
> >>> Junk...
> >>> Junk...
> >>> Heading P01
> >>> column1 column2 multiline text
> >>> CA1001 10 This is a multiline
> >>> text spanning two lines
> >>> CA1005 12 This is a multiline
> >>> text spanning three
> >>> lines
> >>> CA1008 11 This is a single line text
> >>> Heading P02
> >>> column1 column2
> >>> CA2001 10
> >>> CA2003 11
> >>> CA2005 12
> >>> Heading P03
> >>> Junk..
> >>> Junk..
> >>> I would like to list all the values under "Heading P01" for the same
> >>> column1 in a single line
> >>> CA1001 10 This is a multiline text spanning two lines
> >>> CA1005 12 This is a multiline text spanning three lines
> >>> CA1008 11 This is a single line text
> >>> Note: The column1 values will always have "CA" as the starting
> >>> character.
> >>> Appreciate your help in finding a solution using awk or perl or
> >>> sed ...
> >>> Thank you!!!!
> >> awk '/^Heading P01/{x=1} /^Heading P02/{x=0} x==0{next}
> >> /^CA/,/^$/{printf "%s",$0}/^$/{print}' file
>
> >> --
> >> Michael Tosch @ hp : com
>
> > Thanks so much for the neat solution. Would it be possible to add the
> > heading ID to the combined line?
>
> > I tried the following, but the heading is getting added not just at
> > the begining but for every section of the broken line.
>
> > I am trying to figure out a way to get the heading id added once per
> > combined line
>
> > awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}/^CA/,/^$/
> > {printf " %s %s",p,$0}/^$/{print}' file
>
> > P01 CA1001 10 This is a multiline P01 text spanning
> > two lines P01
> > P01 CA1005 12 This is a multiline P01 text spanning
> > three P01 lines P01
> > P01 CA1008 11 This is a single line text P01
>
> > Desired output
>
> > P01 CA1001 10 This is a multiline text spanning two
> > lines
> > P01 CA1005 12 This is a multiline text spanning
> > three lines
> > P01 CA1008 11 This is a single line text
>
> > BTW, what does the "print" at the end of the command do?
>
> awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}
> /^CA/{printf " %s ",p} /^CA/,/^$/{printf "%s",$0} /^$/{print}' file
>
> The print at the end prints a newline character.
> (More precise: it prints the current line with a newline, but the
> current line is empty).
>
> printf "%s" prints without a newline.
>
> --
> Michael Tosch @ hp : com


Thanks for all your help. I have an additional requirement now. Is it
possible to print only the 1st and last (text field) columns in
addition to the heading ID.

P01 CA1001 This is a multiline text spanning two lines
P01 CA1005 This is a multiline text spanning three lines P01
P01 CA1008 This is a single line text P01

Re: Combine multiple line segment into one, when certain patternis found - awk/sed/perl

am 06.11.2007 15:57:50 von Janis Papanagnou

da. Ram wrote:
[snip]
>
> Thanks for all your help. I have an additional requirement now. Is it
> possible to print only the 1st and last (text field) columns in
> addition to the heading ID.

awk '{print $1, $NF}'

will print the first and last field.

>
> P01 CA1001 This is a multiline text spanning two lines
> P01 CA1005 This is a multiline text spanning three lines P01
> P01 CA1008 This is a single line text P01
>

But how is the last field defined? From your example you seem to want
multiple fields that you want to extract. Is the larger space a
delimiter? Do the last fields start from a certain column number? In
the latter case use

awk '{print $1,substr($0,54)}'


Janis

Re: Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

am 08.11.2007 20:58:39 von dba user

On Nov 6, 6:57 am, Janis Papanagnou
wrote:
> da. Ram wrote:
>
> [snip]
>
>
>
> > Thanks for all your help. I have an additional requirement now. Is it
> > possible to print only the 1st and last (text field) columns in
> > addition to the heading ID.
>
> awk '{print $1, $NF}'
>
> will print the first and last field.
>
>
>
> > P01 CA1001 This is a multiline text spanning two lines
> > P01 CA1005 This is a multiline text spanning three lines P01
> > P01 CA1008 This is a single line text P01
>
> But how is the last field defined? From your example you seem to want
> multiple fields that you want to extract. Is the larger space a
> delimiter? Do the last fields start from a certain column number? In
> the latter case use
>
> awk '{print $1,substr($0,54)}'
>
> Janis


Sorry, I missed the post earlier. Thanks for the suggestion and I will
try the substr option.

The last field is in fact a large text with spaces and tabs. They
always start from a certain position and could span multiple lines
until the next record is found. Initial chanleege was to get the
multiple lines combined into one.

Kind regards