How to manipulate newline with mks sed

How to manipulate newline with mks sed

am 13.04.2008 09:09:01 von jpco94340

Sorry for my very bad english.
I want to extract text from an html page and I try to use sed with
the multi line pattern.
To give you a sample. The text is the following:



Here is the text I want to get!!!!!


I Think it was possible to do something very simple like that
cat file.htm | sed 's|\n\n\n^\(.*\)$\n|\1|'

It doesn't work with the SED embeded with MLS Toolkit 8.5.1 on windows
2000 Pro SP4. So I try to understand how it was possible to work with
special characters like newline with a very sample file and I write a
text like that :
A
B
B
I try to simply replace the two first lines by the unique line A with
something like that
cat test.txt | sed 's|A\n\B|A\n|'
It doesn'work !!
I can do that : echo "A\n\B\nC" and I have a file with three
lines
but I can't use the \n with the search/replace option of sed and I
dont find how to tell SED I want to work with newline
I try also \r\n but it doesn'work
I try to use hex values , \Ox0D , \Ox0A , but it doesn't seem to
work.
By example, if I only do that to try switch A and B :
echo "A" | sed 's/A/\0x42/' I only have : Ax42 . I don't find how
to work with hex values

It's a long time I'm searching. I would appreciate some help

Re: How to manipulate newline with mks sed

am 13.04.2008 11:16:33 von PK

jpco94340@hotmail.com wrote:

> 2000 Pro SP4. So I try to understand how it was possible to work with
> special characters like newline with a very sample file and I write a
> text like that :
> A
> B
> B
> I try to simply replace the two first lines by the unique line A with
> something like that
> cat test.txt | sed 's|A\n\B|A\n|'
> It doesn'work !!
> I can do that : echo "A\n\B\nC" and I have a file with three
> lines
> but I can't use the \n with the search/replace option of sed and I
> dont find how to tell SED I want to work with newline
> I try also \r\n but it doesn'work
> I try to use hex values , \Ox0D , \Ox0A , but it doesn't seem to
> work.
> By example, if I only do that to try switch A and B :
> echo "A" | sed 's/A/\0x42/' I only have : Ax42 . I don't find how
> to work with hex values
>
> It's a long time I'm searching. I would appreciate some help

See the SED FAQ:

http://student.northpark.edu/pemente/sed/sedfaq5.html

section 5.10: Why can't I match or delete a newline using the \n escape
sequence? Why can't I match 2 or more lines using \n?

Hope this helps.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: How to manipulate newline with mks sed

am 14.04.2008 01:55:59 von Ed Morton

On 4/13/2008 2:09 AM, jpco94340@hotmail.com wrote:
> Sorry for my very bad english.
> I want to extract text from an html page and I try to use sed with
> the multi line pattern.

Don't. Only use sed for simple substitutions in one line. For anything else use
awk, perl, ruby, etc....

> To give you a sample. The text is the following:
>
>
>
> Here is the text I want to get!!!!!
>
>
> I Think it was possible to do something very simple like that
> cat file.htm | sed 's|\n\n\n^\(.*\)$\n|\1|'

Yes, but not with sed. Just escape the backslashes with GNU awk:

$ cat file



Here is the text I want to get!!!!!

$ gawk -v RS= '{print
gensub(/<\/CENTER>\n<\/td>\n\n(.*)\n<\/td>/,"\\1","")}' file
Here is the text I want to get!!!!!

There may be better solutions depending on your requirements and a more complete
sample input file.

> It doesn't work with the SED embeded with MLS Toolkit 8.5.1 on windows
> 2000 Pro SP4. So I try to understand how it was possible to work with
> special characters like newline with a very sample file and I write a
> text like that :
> A
> B
> B
> I try to simply replace the two first lines by the unique line A with
> something like that
> cat test.txt | sed 's|A\n\B|A\n|'
> It doesn'work !!

$ cat file
A
B
B
$ awk -v RS= '{print gensub(/A\nB/,"A","")}' file
A
B

Regards,

Ed.

> I can do that : echo "A\n\B\nC" and I have a file with three
> lines
> but I can't use the \n with the search/replace option of sed and I
> dont find how to tell SED I want to work with newline
> I try also \r\n but it doesn'work
> I try to use hex values , \Ox0D , \Ox0A , but it doesn't seem to
> work.
> By example, if I only do that to try switch A and B :
> echo "A" | sed 's/A/\0x42/' I only have : Ax42 . I don't find how
> to work with hex values
>
> It's a long time I'm searching. I would appreciate some help