Pesky bug in Perl 5.8.6"s regexeps

Pesky bug in Perl 5.8.6"s regexeps

am 07.09.2007 23:04:19 von kj

I've come across what I'm almost certain is a bug in the Perl
internals, for version 5.8.6. Unfortunately, it is a very skittish
Heisenbug, and I have not been able to reproduce it in a small
script.

In fact, the bug is so quirky that I post this only in the hope
that something will ring a bell with someone who may have seen
something remotely like this before, and who could point me in the
direction of more info.

The bug shows up during a particular execution of the following
code in URI.pm (the comment is mine):

sub _scheme
{
my $self = shift;

unless (@_) {
return unless $$self =~ /^($scheme_re):/o;
return $1; # <---- junk in $1
}

# ...

and manifests itself in the form of occasional junk in $1, as I've
indicated with the added comment. By "junk" I mean stuff that is
not in $$self at all, usually non-ASCII bytes.

$scheme_re = '[a-zA-Z][a-zA-Z0-9.+\-]*';

But I see this behavior only when I run my code with perl 5.8.6.
When I run the same code under 5.8.8 everything works fine.

I can't rule out that the difference between 5.8.6 and 5.8.8 does
not lie in one of the various modules whose versions differ between
my 5.8.6 and 5.8.8 installations, but, FWIW, at least I confirmed
that the error still occurs with 5.8.6 but not with 5.8.8 even when
I ensure that all the modules mentioned in the stack trace at the
point of failure match exactly between the two version.

In case it matters, this is all running under Linux:

jones@luna:~> uname -ar
Linux luna 2.6.11.4-21.17-smp #1 SMP Fri Apr 6 08:42:34 UTC 2007 i686 i686 i386 GNU/Linux

Any troubleshooting ideas you may send my way would be much
appreciated!

TIA,

kj

--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.

Re: Pesky bug in Perl 5.8.6"s regexeps

am 08.09.2007 00:01:14 von Abigail

_
kj (socyl@987jk.com.invalid) wrote on VCXX September MCMXCIII in
:
[]
[]
[]
[]
[] I've come across what I'm almost certain is a bug in the Perl
[] internals, for version 5.8.6. Unfortunately, it is a very skittish
[] Heisenbug, and I have not been able to reproduce it in a small
[] script.
[]
[] In fact, the bug is so quirky that I post this only in the hope
[] that something will ring a bell with someone who may have seen
[] something remotely like this before, and who could point me in the
[] direction of more info.
[]
[] The bug shows up during a particular execution of the following
[] code in URI.pm (the comment is mine):
[]
[] sub _scheme
[] {
[] my $self = shift;
[]
[] unless (@_) {
[] return unless $$self =~ /^($scheme_re):/o;
[] return $1; # <---- junk in $1
[] }
[]
[] # ...
[]
[] and manifests itself in the form of occasional junk in $1, as I've
[] indicated with the added comment. By "junk" I mean stuff that is
[] not in $$self at all, usually non-ASCII bytes.
[]
[] $scheme_re = '[a-zA-Z][a-zA-Z0-9.+\-]*';
[]
[] But I see this behavior only when I run my code with perl 5.8.6.
[] When I run the same code under 5.8.8 everything works fine.
[]
[] I can't rule out that the difference between 5.8.6 and 5.8.8 does
[] not lie in one of the various modules whose versions differ between
[] my 5.8.6 and 5.8.8 installations, but, FWIW, at least I confirmed
[] that the error still occurs with 5.8.6 but not with 5.8.8 even when
[] I ensure that all the modules mentioned in the stack trace at the
[] point of failure match exactly between the two version.

If the error is in 5.8.6, but it's fixed in 5.8.8, what is it that
you want? The time machine that goes back and allows the pumpking to
fix bugs before the release will only be used Christmas.

[] In case it matters, this is all running under Linux:
[]
[] jones@luna:~> uname -ar
[] Linux luna 2.6.11.4-21.17-smp #1 SMP Fri Apr 6 08:42:34 UTC 2007 i686 i686 i386 GNU/Linux
[]
[] Any troubleshooting ideas you may send my way would be much
[] appreciated!


Uhm, upgrade to 5.8.8?


Abigail
--
my $qr = qr/^.+?(;).+?\1|;Just another Perl Hacker;|;.+$/;
$qr =~ s/$qr//g;
print $qr, "\n";

Re: Pesky bug in Perl 5.8.6"s regexeps

am 08.09.2007 01:41:36 von Ben Morrow

Quoth kj :
>
> I've come across what I'm almost certain is a bug in the Perl
> internals, for version 5.8.6. Unfortunately, it is a very skittish
> Heisenbug, and I have not been able to reproduce it in a small
> script.

>
> The bug shows up during a particular execution of the following
> code in URI.pm (the comment is mine):
>
> sub _scheme
> {
> my $self = shift;
>
> unless (@_) {
> return unless $$self =~ /^($scheme_re):/o;
> return $1; # <---- junk in $1
>
> and manifests itself in the form of occasional junk in $1, as I've
> indicated with the added comment. By "junk" I mean stuff that is
> not in $$self at all, usually non-ASCII bytes.

That sounds like a bug in perl's Unicode handling.

> But I see this behavior only when I run my code with perl 5.8.6.
> When I run the same code under 5.8.8 everything works fine.

There were fixes to Unicode's interaction with regexes between 5.8.6 and
5.8.8. See e.g. perldoc perl587delta. If you can't reproduce with 5.8.8
it's likely the bug has been fixed.

> Any troubleshooting ideas you may send my way would be much
> appreciated!

What's the problem? Just use 5.8.8.

Ben