Grep RE for extracting XML attribute value
Grep RE for extracting XML attribute value
am 17.01.2008 14:19:49 von hemant.gaur
If I have XML file like say
......
What grep regular expression should I use to get the attribute test of
the tagA.
Note that their could be white space variations in the XML formatting.
tab or spaces}"hello"{one or more tab or spaces} test1="world">
Thanks,
Hemant
Re: Grep RE for extracting XML attribute value
am 17.01.2008 14:34:34 von hemant.gaur
egrep "([ ]+|[^|]+)test([ ]*|[^|])=" my.xml does the test attribute
extraction. I think there is no more robust way as the tag itself
might be in the separate line.
can we acheive the above with any grep (not egrep form)
Re: Grep RE for extracting XML attribute value
am 17.01.2008 15:18:33 von Bill Marcum
On 2008-01-17, hemant.gaur@gmail.com wrote:
>
>
> If I have XML file like say
>
>
>
> .....
>
>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
>
>
>
>
> tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>
awk -F'>' '/
Re: Grep RE for extracting XML attribute value
am 17.01.2008 15:23:06 von PK
hemant.gaur@gmail.com wrote:
> egrep "([ ]+|[^|]+)test([ ]*|[^|])=" my.xml does the test attribute
> extraction.
This extracts *the whole line* where the test attribute is present, which is
not exactly what you asked in the first place (at least as I understood
it).
Re: Grep RE for extracting XML attribute value
am 17.01.2008 18:02:22 von wayne
hemant.gaur@gmail.com wrote:
> If I have XML file like say
>
>
>
> .....
>
>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
>
>
>
>
> tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>
If you have a recent version of Perl installed, you may also
have "xml_grep" (a Perl script) which "greps" for XPath
expressions in XML files. Maybe that will do what you want.
-Wayne
Re: Grep RE for extracting XML attribute value
am 17.01.2008 22:09:45 von Ed Morton
On 1/17/2008 7:19 AM, hemant.gaur@gmail.com wrote:
> If I have XML file like say
>
>
>
> .....
>
>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
>
>
>
>
> tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>
You might want to take a look at XMLgawk
(http://home.vrweb.de/~juergen.kahrs/gawk/XML/) especially if you forsee
yourself doing other XML processing in future.
Ed.
Re: Grep RE for extracting XML attribute value
am 18.01.2008 08:10:06 von hemant.gaur
Actually, we don't require lot of XML parsing this is only the one
parse required.
I think we need awk, grep combination to get "hello" extracted from
the "test" attribute of the "tagA"
......
Any pointers on this ?
--hemant
Re: Grep RE for extracting XML attribute value
am 18.01.2008 15:40:18 von Ed Morton
On 1/18/2008 1:10 AM, hemant.gaur@gmail.com wrote:
> Actually, we don't require lot of XML parsing this is only the one
> parse required.
> I think we need awk, grep combination to get "hello" extracted from
> the "test" attribute of the "tagA"
>
>
> .....
>
>
> Any pointers on this ?
> --hemant
You rarely need grep if you're using awk since awk can do anything grep can do
(albeit slower on very large files). If your awk supports REs as RSs (e.g. GNU
awk) and neither "<" nor ">" appear within tagged areas and you dont have spaces
in the values inside double quotes (e.g. test="hello world" would fail), then
you could try this:
$ cat file
=
hello test1="world">
$ awk -v RS='[<>]' '/^tagA/{ gsub(/[[:space:]]*=[[:space:]]*/,"="); for
(i=1;i<=NF;i++) if (sub(/^test=/,"",$i)) print $i }' file
"hello"
"hello"
"hello"
hello
Regards,
Ed.