Grep RE for extracting XML attribute value

Grep RE for extracting XML attribute value

am 17.01.2008 14:19:49 von hemant.gaur

If I have XML file like say



......


What grep regular expression should I use to get the attribute test of
the tagA.
Note that their could be white space variations in the XML formatting.



tab or spaces}"hello"{one or more tab or spaces} test1="world">

Thanks,
Hemant

Re: Grep RE for extracting XML attribute value

am 17.01.2008 14:34:34 von hemant.gaur

egrep "([ ]+|[^|]+)test([ ]*|[^|])=" my.xml does the test attribute
extraction. I think there is no more robust way as the tag itself
might be in the separate line.
can we acheive the above with any grep (not egrep form)

Re: Grep RE for extracting XML attribute value

am 17.01.2008 15:18:33 von Bill Marcum

On 2008-01-17, hemant.gaur@gmail.com wrote:
>
>
> If I have XML file like say
>
>
>
> .....
>

>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
>
>
>
> > tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>
awk -F'>' '/

Re: Grep RE for extracting XML attribute value

am 17.01.2008 15:23:06 von PK

hemant.gaur@gmail.com wrote:

> egrep "([ ]+|[^|]+)test([ ]*|[^|])=" my.xml does the test attribute
> extraction.

This extracts *the whole line* where the test attribute is present, which is
not exactly what you asked in the first place (at least as I understood
it).

Re: Grep RE for extracting XML attribute value

am 17.01.2008 18:02:22 von wayne

hemant.gaur@gmail.com wrote:
> If I have XML file like say
>
>
>
> .....
>

>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
>
>
>
> > tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>

If you have a recent version of Perl installed, you may also
have "xml_grep" (a Perl script) which "greps" for XPath
expressions in XML files. Maybe that will do what you want.

-Wayne

Re: Grep RE for extracting XML attribute value

am 17.01.2008 22:09:45 von Ed Morton

On 1/17/2008 7:19 AM, hemant.gaur@gmail.com wrote:
> If I have XML file like say
>
>
>
> .....
>

>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
>
>
>
> > tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>

You might want to take a look at XMLgawk
(http://home.vrweb.de/~juergen.kahrs/gawk/XML/) especially if you forsee
yourself doing other XML processing in future.

Ed.

Re: Grep RE for extracting XML attribute value

am 18.01.2008 08:10:06 von hemant.gaur

Actually, we don't require lot of XML parsing this is only the one
parse required.
I think we need awk, grep combination to get "hello" extracted from
the "test" attribute of the "tagA"


......


Any pointers on this ?
--hemant

Re: Grep RE for extracting XML attribute value

am 18.01.2008 15:40:18 von Ed Morton

On 1/18/2008 1:10 AM, hemant.gaur@gmail.com wrote:
> Actually, we don't require lot of XML parsing this is only the one
> parse required.
> I think we need awk, grep combination to get "hello" extracted from
> the "test" attribute of the "tagA"
>
>
> .....
>

>
> Any pointers on this ?
> --hemant

You rarely need grep if you're using awk since awk can do anything grep can do
(albeit slower on very large files). If your awk supports REs as RSs (e.g. GNU
awk) and neither "<" nor ">" appear within tagged areas and you dont have spaces
in the values inside double quotes (e.g. test="hello world" would fail), then
you could try this:

$ cat file






=

hello test1="world">

$ awk -v RS='[<>]' '/^tagA/{ gsub(/[[:space:]]*=[[:space:]]*/,"="); for
(i=1;i<=NF;i++) if (sub(/^test=/,"",$i)) print $i }' file
"hello"
"hello"
"hello"
hello

Regards,

Ed.