Sed: removing XML headers

Sed: removing XML headers

am 28.03.2007 16:25:02 von bruce_phipps

I am trying to concatenate several XML files (test01.xml, test02.xml,
test03.xml) into a single XML file.

cat test*.xml > out.xml

concatenates the files into one big file.

But the resulting XML file is invalid due to having several XML header
and DOCTYPE tags within the document:

www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

How can I use sed to remove the XML headers within the output file?

Thanks
Bruce

Re: Sed: removing XML headers

am 28.03.2007 17:20:13 von Stephan Grein

Does the job, but this is maybe complete bullsh**:

cat test*.xml | sed {'/ encoding="UTF-8"?>/d;/ Frameset\/\/EN"
"http:\/\//d;/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-frameset.d td">/d }' |
sed '1i\n "-//W3C//DTD XHTML 1.0
Frameset//EN"\n"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frame set.dtd">' >
out.xml

Re: Sed: removing XML headers

am 28.03.2007 18:53:26 von Michael Tosch

bruce_phipps@my-deja.com wrote:
> I am trying to concatenate several XML files (test01.xml, test02.xml,
> test03.xml) into a single XML file.
>
> cat test*.xml > out.xml
>
> concatenates the files into one big file.
>
> But the resulting XML file is invalid due to having several XML header
> and DOCTYPE tags within the document:
>
> > www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
>
> How can I use sed to remove the XML headers within the output file?
>

cat test*.xml |
sed '1n;2n;/^ out.xml


--
Michael Tosch @ hp : com

Re: Sed: removing XML headers

am 28.03.2007 20:08:26 von Janis Papanagnou

bruce_phipps@my-deja.com wrote:
> I am trying to concatenate several XML files (test01.xml, test02.xml,
> test03.xml) into a single XML file.
>
> cat test*.xml > out.xml
>
> concatenates the files into one big file.
>
> But the resulting XML file is invalid due to having several XML header
> and DOCTYPE tags within the document:
>
> > www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

If it is always the first two lines that you have to skip you may use

awk 'FNR==NR||FNR>2' test*.xml

If you you want to match against the specific patterns (assuming that
the xml header patterns don't span across many lines)

awk 'FNR==NR||!(/
The FNR==NR part assures that one (the first) header remains included.

Janis

>
> How can I use sed to remove the XML headers within the output file?

Why sed?

>
> Thanks
> Bruce
>

Re: Sed: removing XML headers

am 29.03.2007 11:31:39 von bruce_phipps

On 28 Mar, 19:08, Janis Papanagnou
wrote:
> bruce_phi...@my-deja.com wrote:
> > I am trying to concatenate several XML files (test01.xml, test02.xml,
> > test03.xml) into a single XML file.
>
> > cat test*.xml > out.xml
>
> > concatenates the files into one big file.
>
> > But the resulting XML file is invalid due to having several XML header
> > and DOCTYPE tags within the document:
> >
> > > >www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
>
> If it is always the first two lines that you have to skip you may use
>
> awk 'FNR==NR||FNR>2' test*.xml
>
> If you you want to match against the specific patterns (assuming that
> the xml header patterns don't span across many lines)
>
> awk 'FNR==NR||!(/ >
> The FNR==NR part assures that one (the first) header remains included.
>
> Janis
>
>
>
> > How can I use sed to remove the XML headers within the output file?
>
> Why sed?
>
>

Thanks for all the replies.
The problem seems to be that the XML is not line-based. It all wraps
into one continuous stream.
I think sed is line-based.
So maybe I should consider other alternatives...
Bruce

Re: Sed: removing XML headers

am 29.03.2007 14:04:24 von Janis Papanagnou

On 29 Mrz., 11:31, bruce_phi...@my-deja.com wrote:
> On 28 Mar, 19:08, Janis Papanagnou
> wrote:
>
>
>
> > bruce_phi...@my-deja.com wrote:
> > > I am trying to concatenate several XML files (test01.xml, test02.xml,
> > > test03.xml) into a single XML file.
>
> > > cat test*.xml > out.xml
>
> > > concatenates the files into one big file.
>
> > > But the resulting XML file is invalid due to having several XML header
> > > and DOCTYPE tags within the document:
> > >
> > > > > >www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
>
> > If it is always the first two lines that you have to skip you may use
>
> > awk 'FNR==NR||FNR>2' test*.xml
>
> > If you you want to match against the specific patterns (assuming that
> > the xml header patterns don't span across many lines)
>
> > awk 'FNR==NR||!(/ >
> > The FNR==NR part assures that one (the first) header remains included.
>
> > Janis
>
> > > How can I use sed to remove the XML headers within the output file?
>
> > Why sed?
>
> Thanks for all the replies.
> The problem seems to be that the XML is not line-based. It all wraps
> into one continuous stream.

You need an XML parser in the general case.

> I think sed is line-based.

Basically yes, and similar with awk.

> So maybe I should consider other alternatives...

Have a look at xgawk it makes the XML parsing task really easy.

Janis

> Bruce