Learning Reg expressions.

Learning Reg expressions.

am 06.01.2005 22:13:48 von 127.0.0.1

I'm just learnign reg expressions and can any one tell me how to do
something like:

Search for subjectpage1.htm

&

change it to titlepageX.htm

Where X is an integer 1-100

Its the changing part I dont know how to do - is it possible to specify it
as a reg expr. ?

TIA

Re: Learning Reg expressions.

am 06.01.2005 23:39:48 von James Keasley

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2005-01-06, <127.0.0.1@127.0.0.1> <127.0.0.1@127.0.0.1> wrote:
> I'm just learnign reg expressions and can any one tell me how to do
> something like:
>
> Search for subjectpage1.htm
>
> &
>
> change it to titlepageX.htm
>
> Where X is an integer 1-100
>
> Its the changing part I dont know how to do - is it possible to specify it
> as a reg expr. ?

easy, where the subjectpage and .htm parts don't change.

$var = "this has subjectpage1.htm in it, and subjectpage99.htm";
$var =~ s/subjectpage(\d+).htm/titlepage$1.htm/g;
print $var . "\n";

the brackets mean save that part to a special variable, the first of
which is $1 for the first section of the regex in brackets.

the g part just means apply this globally through out the input, rather
than just apply it to the first match.

- --
James jamesk[at]homeric[dot]co[dot]uk

I can read your mind, and you should be ashamed of yourself.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFB3b40qfSmHkD6LvoRAp2nAJ9ZHYQqXSXwWN6BeLCc1ElpzM1dygCe JOey
O3+bF3cp55LY4fONP0iJ2iE=
=w9Dz
-----END PGP SIGNATURE-----

Re: Learning Reg expressions.

am 07.01.2005 01:34:48 von Matt Garrish

"James Keasley" wrote in message
news:slrnctrfhk.fjp.james.keasley@athena.homeric.co.uk...
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2005-01-06, <127.0.0.1@127.0.0.1> <127.0.0.1@127.0.0.1> wrote:
>> I'm just learnign reg expressions and can any one tell me how to do
>> something like:
>>
>> Search for subjectpage1.htm
>>
>> &
>>
>> change it to titlepageX.htm
>>
>> Where X is an integer 1-100
>>
>> Its the changing part I dont know how to do - is it possible to specify
>> it
>> as a reg expr. ?
>
> easy, where the subjectpage and .htm parts don't change.
>
> $var = "this has subjectpage1.htm in it, and subjectpage99.htm";
> $var =~ s/subjectpage(\d+).htm/titlepage$1.htm/g;
> print $var . "\n";
>
> the brackets mean save that part to a special variable, the first of
> which is $1 for the first section of the regex in brackets.
>
> the g part just means apply this globally through out the input, rather
> than just apply it to the first match.
>

While that will work, it may not be the most effective regular expression.
Assuming that "subjectpage" has no other meaning in the files, there's no
reason to capture the number following it just to reinsert it:

$var =~ s/subjectpage/titlepage/g;

No need to make your regular expressions more complicated than they need to
be (even in a relatively simple case like this). As a case in point, periods
have a special meaning in regular expressions, so the .htm on the left-hand
side of the substitution may not always do what you expect.

$var =~ s/subjectpage(\d+)\.htm/titlepage$1.htm/g;

The lack of clarity in the original post makes it hard to know exactly what
is needed in this case, though (does X = 1 or some other number, for
example), so neither may be what the OP is after.

Matt

Re: Learning Reg expressions.

am 07.01.2005 01:55:21 von James Keasley

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2005-01-07, Matt Garrish wrote:

> While that will work, it may not be the most effective regular expression.
> Assuming that "subjectpage" has no other meaning in the files, there's no
> reason to capture the number following it just to reinsert it:
>
> $var =~ s/subjectpage/titlepage/g;

Good point, always worth keeping it simple, less to go wrong

> No need to make your regular expressions more complicated than they need to
> be (even in a relatively simple case like this).

Surely you jest, whats the point of using Perl if you don't use
hopelessly opaque regexes whenever possible? Its good as "job-security"
coding. ;)

> As a case in point, periods
> have a special meaning in regular expressions, so the .htm on the left-hand
> side of the substitution may not always do what you expect.
>
> $var =~ s/subjectpage(\d+)\.htm/titlepage$1.htm/g;

Bah, I _knew_ I had forgotten something, but that type of brainfart always
seems to be the hardest type to nail down, it is doing what is expected, it
is just adding some extra stuff as well, BTDTGTT

- --
James jamesk[at]homeric[dot]co[dot]uk

"Luge strategy? Lie flat and try not to die." -Carmen Boyle
(Olympic Luge Gold Medal winner 1996)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFB3d34qfSmHkD6LvoRAlAhAJ4vIr9azRAOfaLuThLrgvUBx6gquwCf c05u
ywCgDPlRGE2e+gG9Z1WIqhQ=
=TVvr
-----END PGP SIGNATURE-----

Re: Learning Reg expressions.

am 07.01.2005 02:37:55 von Joe Smith

James Keasley wrote:

> $var = "this has subjectpage1.htm in it, and subjectpage99.htm";
> $var =~ s/subjectpage(\d+).htm/titlepage$1.htm/g;
> print $var . "\n";

That's the way to do it if the number for titlepageX.htm is the
same as the number for subjectpageX.htm. But if the titlepageX
numbers need to be consecutive even when the subjectpageX numbers
are out of order, use ++ and s///e. The /e modifier causes the
replacement part of s/// to be executed as a perl expression.

---------------------------
@lines = ('First line has subjectpage1.htm',
'subjectpage9.htm has been replaced the second',
'Former fourth line move up one subjectpage4.htm');
$currentpage = 81;
foreach (@lines) {
s/subjectpage\d+\.htm/'titlepage' . $currentpage++ . '.htm'/eg;
}
print join("\n",@lines),"\n";
---------------------------
First line has titlepage81.htm
titlepage82.htm has been replaced the second
Former fourth line move up one titlepage83.htm
---------------------------

-Joe

Re: Learning Reg expressions.

am 07.01.2005 02:37:55 von Joe Smith

James Keasley wrote:

> $var = "this has subjectpage1.htm in it, and subjectpage99.htm";
> $var =~ s/subjectpage(\d+).htm/titlepage$1.htm/g;
> print $var . "\n";

That's the way to do it if the number for titlepageX.htm is the
same as the number for subjectpageX.htm. But if the titlepageX
numbers need to be consecutive even when the subjectpageX numbers
are out of order, use ++ and s///e. The /e modifier causes the
replacement part of s/// to be executed as a perl expression.

---------------------------
@lines = ('First line has subjectpage1.htm',
'subjectpage9.htm has been replaced the second',
'Former fourth line move up one subjectpage4.htm');
$currentpage = 81;
foreach (@lines) {
s/subjectpage\d+\.htm/'titlepage' . $currentpage++ . '.htm'/eg;
}
print join("\n",@lines),"\n";
---------------------------
First line has titlepage81.htm
titlepage82.htm has been replaced the second
Former fourth line move up one titlepage83.htm
---------------------------

-Joe