Odd regex behavior

Odd regex behavior

am 01.10.2007 05:37:13 von Mintcake

I wouldd be grateful to anyone who can shed some light on the
unexpected
results from the regex in the following program.

#!/usr/local/bin/perl -l

use strict;

my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';

for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
{
print "1) $_";
print "2) [$1][$2]";

my $x = /(\w+)=['"](.*)["']/;
print "3) [$x] [$1][$2]";

my $x = /(\w+)=['"](.*)["']/;
print "4) [$x] [$1][$2]";

my $x = /(\w+)=['"](.*)["']/;
print "5) [$x] [$1][$2]";

print "";
}
__END__

The results I get are as follows

1) href="/foo/bar?d=1&c=2&f=1&cards=1"
2) [ x="123"][123]
3) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
4) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
5) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]

1) /foo/bar?d=1&c=2&f=1&cards=1
2) [href][/foo/bar?d=1&c=2&f=1&cards=1]
3) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
4) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
5) [] [=2&f=][]

1) x="123"
2) [=2&f=][]
3) [1] [x][123]
4) [1] [x][123]
5) [1] [x][123]

1) 123
2) [x][123]
3) [] [x][123]
4) [] [x][123]
5) [] [x][123]

Now I accept that this code is sloppy for several reasons but in my
defence I have
to say that it is not my code.

1. A while loop would probably be better than a foreach loop

2. The first regex is attempting to break the string in a list of
att="value" type
strings but is returning att="value" and "value" so the .*? should not
be parenthesized

3. No attempt is made to ensure that the same type of quote is used at
the start and
end of the value

The thing I cannot explain are the results from the second iteration
of the loop. The
same regex is executed three times and each time it fails (correctly),
however, the third
time the $1 and $2 values are overwritten. I have always believed
that the $digit variable
would be preserved if the regex failed to match. Reading the Camel
indicates that this
should indeed be the case.

No matter how many times the regex is executed within the loop it is
only on the final one
$1 and $2 are overwritten

Re: Odd regex behavior

am 01.10.2007 06:08:56 von benkasminbullock

On Sun, 30 Sep 2007 20:37:13 -0700, Mintcake wrote:

> I wouldd be grateful to anyone who can shed some light on the
> unexpected
> results from the regex in the following program.
>
> #!/usr/local/bin/perl -l
>
> use strict;

Adding the line

use warnings;

to your script gives the answer to your problem.

Re: Odd regex behavior

am 01.10.2007 15:34:42 von Paul Lalli

On Sep 30, 11:37 pm, Mintcake wrote:
> I wouldd be grateful to anyone who can shed some light on the
> unexpected
> results from the regex in the following program.
>
> #!/usr/local/bin/perl -l
>
> use strict;

Why are you asking people for help before asking Perl for help? Why
haven't you enabled warnings?

>
> my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';
>
> for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
> {
> print "1) $_";
> print "2) [$1][$2]";
>
> my $x = /(\w+)=['"](.*)["']/;
> print "3) [$x] [$1][$2]";
>
> my $x = /(\w+)=['"](.*)["']/;
> print "4) [$x] [$1][$2]";
>
> my $x = /(\w+)=['"](.*)["']/;
> print "5) [$x] [$1][$2]";
>
> print "";}
>
> __END__

> Now I accept that this code is sloppy for several reasons but in my
> defence I have to say that it is not my code.
>
> 1. A while loop would probably be better than a foreach loop

No, not probably. Definitely. They do not do the same thing at all
in this case, because m//g has very different meanings when evaluated
in a list vs a scalar context.

> The thing I cannot explain are the results from the second
> iteration of the loop. The same regex is executed three times

No it's not. It's only executed once, because you evaluated it in a
list context and then iterated over the results of that one
evaluation, rather than iterating it repeatedly (and progressively) in
a scalar context.

Paul Lalli

Re: Odd regex behavior

am 01.10.2007 15:57:29 von Paul Lalli

On Oct 1, 9:34 am, Paul Lalli wrote:
> On Sep 30, 11:37 pm, Mintcake wrote:
>
> > I wouldd be grateful to anyone who can shed some light on the
> > unexpected results from the regex in the following program.

> > my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';
>
> > for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
> > {
> > print "1) $_";
> > print "2) [$1][$2]";
>
> > my $x = /(\w+)=['"](.*)["']/;
> > print "3) [$x] [$1][$2]";
>
> > my $x = /(\w+)=['"](.*)["']/;
> > print "4) [$x] [$1][$2]";
>
> > my $x = /(\w+)=['"](.*)["']/;
> > print "5) [$x] [$1][$2]";
>
> > print "";}


My profuse apologies. I completely misparsed what your post was
getting at, and came back with a completely wrong answer. Having run
your code, I am also confused as to what's happening. How is $1 being
set to '=2&f=' and how is $2 being undefined, especially seeing as how
as you said, the pattern match is failing. I'm going to keep staring
at it, but I look forward to other responses to this thread. . .

Paul Lalli

Re: Odd regex behavior

am 01.10.2007 16:35:49 von gbacon

Looks like you've found a bug. Please file a report!

Greg
--
When man attempts to rise above Nature, he usually falls below it.
-- Sherlock Holmes

Re: Odd regex behavior

am 01.10.2007 23:19:55 von demerphq

On Oct 1, 6:08 am, Ben Bullock wrote:
> On Sun, 30 Sep 2007 20:37:13 -0700, Mintcake wrote:
> > I wouldd be grateful to anyone who can shed some light on the
> > unexpected
> > results from the regex in the following program.
>
> > #!/usr/local/bin/perl -l
>
> > use strict;
>
> Adding the line
>
> use warnings;
>
> to your script gives the answer to your problem.

No, warnings have nothing to do with this.

Yves

Re: Odd regex behavior

am 01.10.2007 23:28:51 von demerphq

On Oct 1, 5:37 am, Mintcake wrote:
> I wouldd be grateful to anyone who can shed some light on the
> unexpected
> results from the regex in the following program.
>
> #!/usr/local/bin/perl -l
>
> use strict;
>
> my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';
>
> for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
> {
> print "1) $_";
> print "2) [$1][$2]";
>
> my $x = /(\w+)=['"](.*)["']/;
> print "3) [$x] [$1][$2]";
>
> my $x = /(\w+)=['"](.*)["']/;
> print "4) [$x] [$1][$2]";
>
> my $x = /(\w+)=['"](.*)["']/;
> print "5) [$x] [$1][$2]";
>
> print "";}
>
> __END__
>
> The results I get are as follows
>
> 1) href="/foo/bar?d=1&c=2&f=1&cards=1"
> 2) [ x="123"][123]
> 3) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
> 4) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
> 5) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
>
> 1) /foo/bar?d=1&c=2&f=1&cards=1
> 2) [href][/foo/bar?d=1&c=2&f=1&cards=1]
> 3) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
> 4) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
> 5) [] [=2&f=][]

This is a bug for sure. Notice that '=2&f=' is the same length as
'cards'. How it ends up at that offset im not sure and I havent
debugged it to see whats up.

The good news is that I already fixed this for 5.10, although its hard
to say which fix was responsible, there were a number related to
capturing and rollbacks and the like done in the 5.9.x line.

The bad news is that the patch is highly unlikely to be back ported to
5.8.x :-(

Interesting bug tho. Cheers.

Yves

Re: Odd regex behavior

am 02.10.2007 18:51:17 von nobull67

On Oct 1, 4:37 am, Mintcake wrote:
> I wouldd be grateful to anyone who can shed some light on the
> unexpected
> results from the regex in the following program.

I suspect that this is pretty much the same issue as was discussed
here recently

http://groups.google.com/group/comp.lang.perl.misc/msg/d128a 5c4d28a917b

Here's a much simpler way to reproduce it

use strict;
use warnings;

'From outside loop' =~ /(.*)/;

for my $pass ( 1, 2 ) {
print "$1\n";
'From later inside loop' =~ /(.*)/;
}
__END__

The above could reasonably be expected to print 'From outside loop'
twice but actually prints 'From later inside loop' the second time.

The work-round is simply to double the {}

use strict;
use warnings;

'From outside loop' =~ /(.*)/;

for my $pass ( 1, 2 ) {{
print "$1\n";
'From later inside loop' =~ /(.*)/;
}}
__END__

I am able to reproduce this in 5.9.5.

Re: Odd regex behavior

am 02.10.2007 18:56:50 von nobull67

On Oct 2, 5:51 pm, Brian McCauley wrote:
> On Oct 1, 4:37 am, Mintcake wrote:
>
> > I wouldd be grateful to anyone who can shed some light on the
> > unexpected
> > results from the regex in the following program.
>
> I suspect that this is pretty much the same issue as was discussed
> here recently

Correction - if it wasn't for that issue you probably would not have
been able to observe the bug.

There is, of course, as Yves points out a much more serious bug here
too.

Re: Odd regex behavior

am 03.10.2007 03:03:34 von sln

On Mon, 01 Oct 2007 21:28:51 -0000, demerphq@gmail.com wrote:

>On Oct 1, 5:37 am, Mintcake wrote:
>> I wouldd be grateful to anyone who can shed some light on the
>> unexpected
>> results from the regex in the following program.
>>
>> #!/usr/local/bin/perl -l
>>
>> use strict;
>>
>> my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';
>>
>> for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
>> {
>> print "1) $_";
>> print "2) [$1][$2]";
>>
>> my $x = /(\w+)=['"](.*)["']/;
>> print "3) [$x] [$1][$2]";
>>
>> my $x = /(\w+)=['"](.*)["']/;
>> print "4) [$x] [$1][$2]";
>>
>> my $x = /(\w+)=['"](.*)["']/;
>> print "5) [$x] [$1][$2]";
>>
>> print "";}
>>
>> __END__
>>
>> The results I get are as follows
>>
>> 1) href="/foo/bar?d=1&c=2&f=1&cards=1"
>> 2) [ x="123"][123]
>> 3) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
>> 4) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
>> 5) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
>>
>> 1) /foo/bar?d=1&c=2&f=1&cards=1
>> 2) [href][/foo/bar?d=1&c=2&f=1&cards=1]
>> 3) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
>> 4) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
>> 5) [] [=2&f=][]
>
>This is a bug for sure. Notice that '=2&f=' is the same length as
>'cards'. How it ends up at that offset im not sure and I havent
>debugged it to see whats up.
>
>The good news is that I already fixed this for 5.10, although its hard
>to say which fix was responsible, there were a number related to
>capturing and rollbacks and the like done in the 5.9.x line.
>
>The bad news is that the patch is highly unlikely to be back ported to
>5.8.x :-(
>
>Interesting bug tho. Cheers.
>
>Yves

I browsed this article at work at lunch. It seems interresting to
observe the state of variables between non-matching regex itterations.
I don't recall that state being something reliable, nor predictable.

I've been on 5.8.x a long time, and have always known about this.
I don't understand the concern. The un-matched state of variables has
always been ignored. I can't think of a logical construct in this case.

If it's said the state will remain the same on fail, what good does
that do? You don't retro-actively "consume on fail".

Re: Odd regex behavior

am 03.10.2007 03:14:52 von sln

On Tue, 02 Oct 2007 16:51:17 -0000, Brian McCauley wrote:

>On Oct 1, 4:37 am, Mintcake wrote:
>> I wouldd be grateful to anyone who can shed some light on the
>> unexpected
>> results from the regex in the following program.
>
>I suspect that this is pretty much the same issue as was discussed
>here recently
>
>http://groups.google.com/group/comp.lang.perl.misc/msg/d128 a5c4d28a917b
>
>Here's a much simpler way to reproduce it
>
>use strict;
>use warnings;
>
>'From outside loop' =~ /(.*)/;
>
>for my $pass ( 1, 2 ) {
> print "$1\n";
> 'From later inside loop' =~ /(.*)/;
>}
>__END__
>
>The above could reasonably be expected to print 'From outside loop'
>twice but actually prints 'From later inside loop' the second time.
>
>The work-round is simply to double the {}
>
>use strict;
>use warnings;
>
>'From outside loop' =~ /(.*)/;
>
>for my $pass ( 1, 2 ) {{
> print "$1\n";
> 'From later inside loop' =~ /(.*)/;
>}}
>__END__
>
>I am able to reproduce this in 5.9.5.

I'm a little unsure of the logic. In your loop, you do a regex behind
the print $1. Wouldn't you expect the result from the last regex?

If regex finally has "scope", you should expect garbage or unreliable results
in the first pass. The for { } is scope, the second pass prints the inside.

Probably, the $_ should clear the $n variables though, can't remember if it
does. I don't think it does in 8.

I didn't try your code. I'm a little skeptical if {{}} would/should do anything though.

Re: Odd regex behavior

am 04.10.2007 14:15:49 von nobull67

On Oct 3, 2:14 am, s...@netherlands.co wrote:
> On Tue, 02 Oct 2007 16:51:17 -0000, Brian McCauley wrote:
> >'From outside loop' =~ /(.*)/;
>
> >for my $pass ( 1, 2 ) {
> > print "$1\n";
> > 'From later inside loop' =~ /(.*)/;
> >}
> >__END__
>
> >The above could reasonably be expected to print 'From outside loop'
> >twice but actually prints 'From later inside loop' the second time.

> I'm a little unsure of the logic. In your loop, you do a regex behind
> the print $1.

Yes, that's the whole point.

> Wouldn't you expect the result from the last regex?

No I'd expect the result from the last regex excluding those from
dynamic scopes that have now ended. On the second iteration of the
loop the dynamic scope from the first iteration has ended so I should
not see the result of the regex.

> If regex finally has "scope", you should expect garbage or unreliable results
> in the first pass.

No, it is defined that if there has been no successful regex match in
the current dynamic scope then the parent dynamic scope is examined.
This is usual for dynamic scopes.

> The for { } is scope, the second pass prints the inside.

Yes, this is the bug I'm reporting.

> Probably, the $_ should clear the $n variables though, can't remember if it
> does.

$_ is not involved anywhere in my example.

> I didn't try your code.

I did.