Omitting data from a single line..

Omitting data from a single line..

am 27.11.2007 06:36:13 von Cranky

I have files that are 1 liners that have data that I would like to
sed, grep, or awk out. Here is an example:


Book of Love0574638694274Love
has no fury and therefore this is no different than any other
insipidity that we come across. Read this and
weep.
12.95Jane
Russo
Chastising<br /> Daisy095736485284Not the usual driving
miss daisy that we are used to. This will set the mood for
demeanor.
8.95Jack Palance

Basically I would like to omit everything between the
and
tags. As far as the "description" tags themselves,
it really doesn't matter whether those go or not so long as everything
in between them does.

Thanks,
Cranky

Re: Omitting data from a single line..

am 27.11.2007 06:49:50 von xhoster

Cranky wrote:
> I have files that are 1 liners that have data that I would like to
> sed, grep, or awk out. Here is an example:

Are you aware that this is Perl newsgroup, not an sed, awk, or grep one?

....
> Basically I would like to omit everything between the
> and
tags. As far as the "description" tags themselves,
> it really doesn't matter whether those go or not so long as everything
> in between them does.

In Perl:

$line =~ s/.*?<\/description>//g;

There are various gotcha that occur with this. In general it is a good
idea to use an XML processor to process XML, but I do do things like this
occasionally and generally get away with it.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Re: Omitting data from a single line..

am 27.11.2007 14:15:57 von Ben Morrow

Quoth Cranky :
> I have files that are 1 liners that have data that I would like to
> sed, grep, or awk out.

Then why are you asking in a Perl group? :)

> Here is an example:
>
> Book of Love0574638694274Love
> has no fury and therefore this is no different than any other
> insipidity that we come across. Read this and
> weep.
12.95Jane
> Russo
Chastising<br /> > Daisy095736485284Not the usual driving
> miss daisy that we are used to. This will set the mood for
> demeanor.
8.95Jack Palance
>
> Basically I would like to omit everything between the
> and
tags. As far as the "description" tags themselves,
> it really doesn't matter whether those go or not so long as everything
> in between them does.

You need to be clearer about what can appear between the tags. If this
is a fragment of real XML, then the right answer is to temporarily add a
fake outermost element and parse it with a real XML parser, then
extract the bits you want (using SAX or DOM/XPath as takes your fancy)
and remove the outer tags again. If it isn't, you will need to answer
questions like 'can a description ever contain a '<'?' and 'how is it
encoded if it does?'.

Ben

Re: Omitting data from a single line..

am 27.11.2007 16:05:34 von it_says_BALLS_on_your forehead

On Nov 27, 12:49 am, xhos...@gmail.com wrote:
> Cranky wrote:
> > I have files that are 1 liners that have data that I would like to
> > sed, grep, or awk out. Here is an example:
>
> Are you aware that this is Perl newsgroup, not an sed, awk, or grep one?
>
> ...
>
> > Basically I would like to omit everything between the
> > and
tags. As far as the "description" tags themselves,
> > it really doesn't matter whether those go or not so long as everything
> > in between them does.
>
> In Perl:
>
> $line =~ s/.*?<\/description>//g;
>
> There are various gotcha that occur with this. In general it is a good
> idea to use an XML processor to process XML, but I do do things like this
> occasionally and generally get away with it.


This is probably the simplest method of removing the description,
although, as you hint at, there are issues--nested descriptions for
instance. However, based on the OP's example, that doesn't appear to
be a concern. I'm not absolutely certain about that, but am reasonably
confident.