Replace without back reference

Replace without back reference

am 06.11.2007 12:18:44 von howa

Consider the code below:

############
$str = "ABCD";

$str =~ s/(C)D/$1F/g;

print $str;

############


It replace `ABCD` to `ABCF`, using back reference.

I need to do these kind of operations a lot, seems using back
reference is slow

Any better or faster approach for the above example?

Thanks.

Re: Replace without back reference

am 06.11.2007 14:15:52 von krahnj

howa wrote:
>
> Consider the code below:
>
> ############
> $str = "ABCD";
>
> $str =~ s/(C)D/$1F/g;
>
> print $str;
>
> ############
>
> It replace `ABCD` to `ABCF`, using back reference.
>
> I need to do these kind of operations a lot, seems using back
> reference is slow
>
> Any better or faster approach for the above example?

I don't know if this is better or faster (use Benchmark to verify):

$str =~ s/(?<=C)D/F/g;


John
--
use Perl;
program
fulfillment

Re: Replace without back reference

am 06.11.2007 14:45:41 von nobull67

On Nov 6, 11:18 am, howa wrote:

> $str =~ s/(C)D/$1F/g;

> I need to do these kind of operations a lot, seems using back
> reference is slow
>
> Any better or faster approach for the above example?

Not yet, but as of 5.10 [1] you'll be able to use \K.

$str =~ s/C\KD/F/g;

[1] http://www.regex-engineer.org/slides/perl510_regex.html

Re: Replace without back reference

am 06.11.2007 19:34:45 von Jim Gibson

In article <1194347924.889686.170950@t8g2000prg.googlegroups.com>, howa
wrote:

> Consider the code below:
>
> ############
> $str = "ABCD";
>
> $str =~ s/(C)D/$1F/g;
>
> print $str;
>
> ############
>
>
> It replace `ABCD` to `ABCF`, using back reference.
>
> I need to do these kind of operations a lot, seems using back
> reference is slow
>
> Any better or faster approach for the above example?

Here is another way:

while( (my $pos = index($str,'CD') ) > -1 ) {
substr($str,($pos+1),1,'F');
}

I don't know if that is faster, so I have to benchmark it:

#!/usr/local/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);

my $str = 'ABCDEFGHCDIJCXKLCDMNOPC';
my $s;

cmpthese( 1_000_000, {

'Backreference' => sub{ ($s = $str) =~ s/(C)D/$1F/g; },
'Lookbehind' => sub{ ($s = $str) =~ s/(?<=C)D/F/g; },
'Index/Substr' => sub{
$s = $str;
while( (my $pos = index($s,'CD') ) > -1 ) {
substr($s,($pos+1),1,'F');
}
}
});

Rate Backreference Lookbehind Index/Substr
Backreference 262467/s -- -47% -58%
Lookbehind 495050/s 89% -- -20%
Index/Substr 621118/s 137% 25% --

Platform: perl v5.8.8 on Mac OS 10.5.

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Re: Replace without back reference

am 07.11.2007 04:50:27 von howa

On 11 6 , 9 45 , Brian McCauley wrote:
> On Nov 6, 11:18 am, howa wrote:
>
> > $str =~ s/(C)D/$1F/g;
> > I need to do these kind of operations a lot, seems using back
> > reference is slow
>
> > Any better or faster approach for the above example?
>
> Not yet, but as of 5.10 [1] you'll be able to use \K.
>
> $str =~ s/C\KD/F/g;
>
> [1]http://www.regex-engineer.org/slides/perl510_regex.html


Hello,

I am using Perl 5.8.7 on Windows, when using \K, it shows:

Unrecognized escape \K passed through at C:\test.pl line 6.

any idea?

Re: Replace without back reference

am 07.11.2007 12:10:32 von Tzy-Jye Daniel Lin

On Tue, 06 Nov 2007 19:50:27 -0800, howa wrote:

> On 11 6 , 9 45 , Brian McCauley wrote:
>>
>> Not yet, but as of 5.10 [1] you'll be able to use \K.
>>
>> $str =~ s/C\KD/F/g;
>>
>> [1]http://www.regex-engineer.org/slides/perl510_regex.html
>
> I am using Perl 5.8.7 on Windows, when using \K, it shows:
>
> Unrecognized escape \K passed through at C:\test.pl line 6.
>
> any idea?

As Brian said,
\K is not in any stable Perl at the moment.
\K will be in Perl 5.10 when it is released.

Re: Replace without back reference

am 07.11.2007 18:12:47 von Jim Gibson

In article <1194407427.449770.6890@v29g2000prd.googlegroups.com>, howa
wrote:

> On 11 6 , 9 45 , Brian McCauley wrote:
> > On Nov 6, 11:18 am, howa wrote:
> >
> > > $str =~ s/(C)D/$1F/g;
> > > I need to do these kind of operations a lot, seems using back
> > > reference is slow
> >
> > > Any better or faster approach for the above example?

I thought of another way:

$str = s/CD/CF/g;

It's probably about the same as look-behind, but let's throw it into
the benchmark program and see:

#!/usr/local/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);

my $str = 'ABCDEFGHCDIJCXKLCDMNOPC';
my $s;

cmpthese( 1_000_000, {

'BckRef' => sub{ ($s = $str) =~ s/(C)D/$1F/g; },
'Subst' => sub{ ($s = $str) =~ s/CD/CF/g; },
'LkBnd' => sub{ ($s = $str) =~ s/(?<=C)D/F/g; },
'Index' => sub{
$s = $str;
while( (my $pos = index($s,'CD') ) > -1 ) {
substr($s,($pos+1),1,'F');
}
}
});

Rate BckRef LkBnd Index Subst
BckRef 263158/s -- -47% -61% -73%
LkBnd 500000/s 90% -- -25% -48%
Index 666667/s 153% 33% -- -31%
Subst 970874/s 269% 94% 46% --

Surprising, no?

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Re: Replace without back reference

am 08.11.2007 03:31:26 von Ben Morrow

Quoth Jim Gibson :
>
> I thought of another way:
>
> $str = s/CD/CF/g;
>
> It's probably about the same as look-behind, but let's throw it into
> the benchmark program and see:
>
> #!/usr/local/bin/perl
> use strict;
> use warnings;
> use Benchmark qw(cmpthese);
>
> my $str = 'ABCDEFGHCDIJCXKLCDMNOPC';
> my $s;
>
> cmpthese( 1_000_000, {
>
> 'BckRef' => sub{ ($s = $str) =~ s/(C)D/$1F/g; },
> 'Subst' => sub{ ($s = $str) =~ s/CD/CF/g; },
> 'LkBnd' => sub{ ($s = $str) =~ s/(?<=C)D/F/g; },
> 'Index' => sub{
> $s = $str;
> while( (my $pos = index($s,'CD') ) > -1 ) {
> substr($s,($pos+1),1,'F');
> }
> }
> });
>
> Rate BckRef LkBnd Index Subst
> BckRef 263158/s -- -47% -61% -73%
> LkBnd 500000/s 90% -- -25% -48%
> Index 666667/s 153% 33% -- -31%
> Subst 970874/s 269% 94% 46% --
>
> Surprising, no?

No. Capturing is sloooooooow. Apart from that, I would expect the regex
engine to be much faster than your index/substr construction, mostly
because it uses fewer Perl ops. With such a simple pattern the regex
will be optimised into the equivalent of index anyway, and the rx engine
proper won't even be invoked.

If you're really interested, you'll need to benchmark this all again
with 5.10. A lot of work has gone into making the optimiser catch more
'simple' cases.

Ben