Search/Replace text in XML file

Search/Replace text in XML file

am 09.01.2008 20:21:41 von Lax

Hello all,
I'm trying to search and replace the value of a tag in an xml file.
I'm not in a position to use the usual XML parsers as the version of
Perl I'm required to use
doesnt contain any of the XML libraries. I can use Text::Balanced, but
I want to deal with the xml file on a
line-by-line basis, as the value of my tag could strecth over multiple-
lines.

Perl Version:
This is perl, v5.8.7 built for sun4-solaris

Sample xml file:
-------------------------



1.0.0


invalid version






invalid version




stand-alone, but not valid either



-------------------------

I only want the version tag when they're not enclosed in any other
tags.
I want to replace the 1.0.0 (an example value) with 2.0.0 on an stand-
alone "version"'s first occurence.
I came up with the following:

--------------------

#!/usr/local/bin/perl

use strict ;
use File::Copy ;

die "Usage: replace.pl !\n" unless ( $#ARGV == 0 ) ;
my $file = shift ;

open(IN,"$file") or die "Cant open file: $!\n" ;
chomp(my @arr = ) ;
close(IN) ;

open(OUT,"> bak") or die "Cant open file: $!\n" ;

# Two flags,
# $tag_flag -- to check if we're inside a tag
# $version_flag -- to check if we've replaced version tag already.

my $tag_flag = "off" ;
my $version_flag = "off" ;

foreach my $line ( @arr )
{
# Dont consider the open and close of top-level tag.
if ( $line =~ /^\s*\<(\/)?project/ )
{
print OUT "$line\n" ;
next ;
}

# Found , replace version string if tag_flag is on and
version_flag is off.
elsif ( ($line =~ /^\s*\/) && ( $tag_flag eq "off" ) &&
( $version_flag eq "off" ) )
{
# print "Flag: $flag\n" ;
print OUT "2.0.0\n" ;
$tag_flag = "on" ;
$version_flag = "on" ;
}

# Inside an open tag "<", tag_flag on.
elsif ( ( $line =~ /^\s*\<.*\>/ ) && ( $line !~ /^\s*\<\/.*
\>/ ) )
{
print OUT "$line\n" ;
$tag_flag = "on" ;
}

# Inside a close tag " elsif ( $line =~ /^\s*\<\/.*\>/ )
{
print OUT "$line\n" ;
$tag_flag = "off" ;
} else {
print OUT "$line\n" ;
}
}
close(OUT) ;

# Move bak file to original

------------------------------------------

The above script works, and a "diff bak " gives me the
expected result when the stand-alone is all on one line, I
cant get this working when its extended over multiple-lines.

Could anyone give me some pointers, please?

Thanks,
Lax

Re: Search/Replace text in XML file

am 09.01.2008 20:27:37 von Lax

On Jan 9, 2:21=A0pm, Lax wrote:
> =A0 =A0 =A0 =A0 # Found , replace version string if tag_flag is o=
n and
> version_flag is off.
> =A0 =A0 =A0 =A0 # Inside an open tag "<", tag_flag on.
> =A0 =A0 =A0 =A0 # Inside a close tag "
Please ignore the inaccurate values for off/on in the comments, the
code has proper values for the flags, sorry.

Thanks,
Lax

Re: Search/Replace text in XML file

am 09.01.2008 22:20:11 von Jim Gibson

In article
<5d60d81b-4be0-48b6-8e57-f8c840597e78@p69g2000hsa.googlegroups.com>,
Lax wrote:

> Hello all,
> I'm trying to search and replace the value of a tag in an xml file.
> I'm not in a position to use the usual XML parsers as the version of
> Perl I'm required to use
> doesnt contain any of the XML libraries. I can use Text::Balanced, but
> I want to deal with the xml file on a
> line-by-line basis, as the value of my tag could strecth over multiple-
> lines.

[data, program snipped]

> ------------------------------------------
>
> The above script works, and a "diff bak " gives me the
> expected result when the stand-alone is all on one line, I
> cant get this working when its extended over multiple-lines.
>
> Could anyone give me some pointers, please?

Read the entire file into a single scalar:

my $contents = do { local $/; };

Then add the /s modifier to your regular expression so that the '.'
special character will match the newlines embedded in your string.

See 'perldoc 'q entire' and 'perldoc perlre'.

Also read the documentation on the File::Slurp module, although that is
not a core module and may not be loaded with your Perl.

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Re: Search/Replace text in XML file

am 09.01.2008 22:23:58 von someone

Jim Gibson wrote:
> In article
> <5d60d81b-4be0-48b6-8e57-f8c840597e78@p69g2000hsa.googlegroups.com>,
> Lax wrote:
>
>> Hello all,
>> I'm trying to search and replace the value of a tag in an xml file.
>> I'm not in a position to use the usual XML parsers as the version of
>> Perl I'm required to use
>> doesnt contain any of the XML libraries. I can use Text::Balanced, but
>> I want to deal with the xml file on a
>> line-by-line basis, as the value of my tag could strecth over multiple-
>> lines.
>
> [data, program snipped]
>
>> ------------------------------------------
>>
>> The above script works, and a "diff bak " gives me the
>> expected result when the stand-alone is all on one line, I
>> cant get this working when its extended over multiple-lines.
>>
>> Could anyone give me some pointers, please?
>
> Read the entire file into a single scalar:
>
> my $contents = do { local $/; };
>
> Then add the /s modifier to your regular expression so that the '.'
> special character will match the newlines embedded in your string.
>
> See 'perldoc 'q entire' and 'perldoc perlre'.
ITYM: perldoc -q entire



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: Search/Replace text in XML file

am 10.01.2008 03:35:34 von Tad J McClellan

Lax wrote:

> I'm trying to search and replace the value of a tag in an xml file.


No you're not.

You are trying to search and replace the value of an element in an xml file.

See the XML FAQ:

http://xml.silmaril.ie/authors/makeup/


> Sample xml file:
> -------------------------
>
>
>
> 1.0.0
>
>
> invalid version
>

>
>
>
>
>
> invalid version
>

>

>

>
> stand-alone, but not valid either
>
>

>
> -------------------------
>
> I only want the version tag when they're not enclosed in any other
> tags.


It is not legal in XML for a tag to enclose any other tag.

(tags start with a '<' and end with a '>')


You must have meant "element" where you said "tag".

In that case, there ARE NO version elements that are not enclosed
in any other elements!


> I want to replace the 1.0.0 (an example value) with 2.0.0


That element is enclosed in the project element.


> on an stand-
> alone "version"'s first occurence.


You want to replace the 1.0.0 with 2.0.0 on the first version element
that is a child of the document element (the project element in this case).

(in which case you have a poor example input, as a solution that
operates on the first anywhere in the file will work
for that input...
)

> The above script works, and a "diff bak " gives me the
> expected result when the stand-alone is all on one line, I
> cant get this working when its extended over multiple-lines.


Extended over multiple lines in what manner? Like this:

>1.0.0


or like


1.0.0


or like


1.0.0



??

Those all are legal XML, but none of them are equivalent, they each
have different content.


> Could anyone give me some pointers, please?


If I could unambiguously figure out what you really want I probably could...


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Search/Replace text in XML file

am 10.01.2008 03:47:54 von Tad J McClellan

Lax wrote:

> #!/usr/local/bin/perl
>
> use strict ;


You should always enable warnings when developing Perl code:

use warnings;


> die "Usage: replace.pl !\n" unless ( $#ARGV == 0 ) ;


That is more clearly written as:

die "Usage: replace.pl !\n" unless @ARGV == 1;


> my $file = shift ;
>
> open(IN,"$file") or die "Cant open file: $!\n" ;


perldoc -q vars

What's wrong with always quoting "$vars"?

open(IN, $file) or die "Cant open file: $!\n" ;

(and nowadays you should use the 3-argument form of open() instead.)


> chomp(my @arr = ) ;


Here you remove the newline from every line, and below you add a
newline to every line.

Why remove them only to put them back?


> foreach my $line ( @arr )


If you are going to process the file line-by-line anyway, then why
bother reading the entire file into memory when one line at a time
in memory will work?

while ( my $line = )


> if ( $line =~ /^\s*\<(\/)?project/ )


The parenthesis in that pattern serve no purpose, so why include them?

Angle brackets are not special in regular expressions, so they
do not need backslashing.

If you choose some other delimiter for your match operator, then
the slash will not need backslashing either:

if ( $line =~ m#^\s*

> I
> cant get this working when its extended over multiple-lines.


Then don't process the file line-by-line.


> Could anyone give me some pointers, please?


perldoc -q match

I'm having trouble matching over more than one line. What's wrong?


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: Search/Replace text in XML file

am 26.01.2008 04:31:12 von sln

On Wed, 9 Jan 2008 11:21:41 -0800 (PST), Lax wrote:

>Hello all,
>I'm trying to search and replace the value of a tag in an xml file.
>I'm not in a position to use the usual XML parsers as the version of
>Perl I'm required to use
>doesnt contain any of the XML libraries. I can use Text::Balanced, but
>I want to deal with the xml file on a
>line-by-line basis, as the value of my tag could strecth over multiple-
>lines.
>
>Perl Version:
>This is perl, v5.8.7 built for sun4-solaris
>
>Sample xml file:
>-------------------------
>
>
>
> 1.0.0
>
>
> invalid version
>

>
>
>
>
>
> invalid version
>

>

>

>
> stand-alone, but not valid either
>
>

>
>-------------------------
>
>I only want the version tag when they're not enclosed in any other
>tags.
>I want to replace the 1.0.0 (an example value) with 2.0.0 on an stand-
>alone "version"'s first occurence.
>I came up with the following:
>
>--------------------
>
>#!/usr/local/bin/perl
>
>use strict ;
>use File::Copy ;
>
>die "Usage: replace.pl !\n" unless ( $#ARGV == 0 ) ;
>my $file = shift ;
>
>open(IN,"$file") or die "Cant open file: $!\n" ;
>chomp(my @arr = ) ;
>close(IN) ;
>
>open(OUT,"> bak") or die "Cant open file: $!\n" ;
>
># Two flags,
># $tag_flag -- to check if we're inside a tag
># $version_flag -- to check if we've replaced version tag already.
>
>my $tag_flag = "off" ;
>my $version_flag = "off" ;
>
>foreach my $line ( @arr )
>{
> # Dont consider the open and close of top-level tag.
> if ( $line =~ /^\s*\<(\/)?project/ )
> {
> print OUT "$line\n" ;
> next ;
> }
>
> # Found , replace version string if tag_flag is on and
>version_flag is off.
> elsif ( ($line =~ /^\s*\/) && ( $tag_flag eq "off" ) &&
>( $version_flag eq "off" ) )
> {
> # print "Flag: $flag\n" ;
> print OUT "2.0.0\n" ;
> $tag_flag = "on" ;
> $version_flag = "on" ;
> }
>
> # Inside an open tag "<", tag_flag on.
> elsif ( ( $line =~ /^\s*\<.*\>/ ) && ( $line !~ /^\s*\<\/.*
>\>/ ) )
> {
> print OUT "$line\n" ;
> $tag_flag = "on" ;
> }
>
> # Inside a close tag " > elsif ( $line =~ /^\s*\<\/.*\>/ )
> {
> print OUT "$line\n" ;
> $tag_flag = "off" ;
> } else {
> print OUT "$line\n" ;
> }
>}
>close(OUT) ;
>
># Move bak file to original
>
>------------------------------------------
>
>The above script works, and a "diff bak " gives me the
>expected result when the stand-alone is all on one line, I
>cant get this working when its extended over multiple-lines.
>
>Could anyone give me some pointers, please?
>
>Thanks,
>Lax

I'm working on a modification of a module I wrote to do this type of thing (started last week).
But what you have stated on top is that you want to first "find" a version element that is "not"
inside of another element. This is hard to do bro.

You don't want to setup a search of tags with conditionals (in the general sense.) Its not like
regexpresions for xml. In the limited sense, as a basis, a search is setup as a singular for that "tag".
Should it encounter another identical "tag", does the search start over even if you have found "sub-tags"
in a tree, thereby invalidating this search, resetting it. Anchored/Unanchored (outer/inner if you will).

In parsing XML its easy to push/pop tags to determine validity. So its easy to say, find tag1->tag2->...
Inner/outer may be selectable. What about attribute names as conditionals? tag1->tag2->attr.

What about other items? What about content? Should content be a condition?
What should happen when all the conditions are met? What should be replaced, the tag, the attribute name/value,
the content? What if the content is spread over other items before closure?

To integrate a search & replace engine into a stream parser is a dificult task indeed. I am going to start slow,
tags and attributes first and move up from there.