interesting case of data corruption, and its cause

interesting case of data corruption, and its cause

am 22.11.2007 12:45:35 von bugbear

I just fell foul of something;

I was using map to extract a load of data from a source array
(array of complex structures) into a new array
(array of complex, but different structures).

I wrote a subroutine, called (e.g.) "extract",
which takes a single src object, and returns a
new object.

I then (obviously...) wrote:

my @extracted = map { extract($_) } @src_data;

and ended up with corrupted @src_data.

Since extract is (in fact) rather complex,
making wide use of both CPan code, and
large swathes of my code, it took
a while for me to track down the problem.

I'll pause here while the gurus and others
consider wether my fault is "obvious".

OK, the difficulty lies in $_. Map
sets $_ not to be a copy of the array item,
but a shared reference to it.

In particular, assigning to $_ alters
the item.

I wasn't (consciously) doing so, but "somewhere"
in the complex workings of extract, there was
an IO loop of the classic "while()"
kind, which does assign to $_, which
is (perhaps counterintuitively) a
good old fashioned global variable.

If this is all obvious, and I've bored you,
I apologise.

But it took me a good while to track down,
and though it might be worthy of a warning
to others as ill-informed as myself.

BugBear

Re: interesting case of data corruption, and its cause

am 22.11.2007 13:33:23 von Peter Makholm

bugbear writes:

> I wrote a subroutine, called (e.g.) "extract",
> which takes a single src object, and returns a
> new object.

And obviously changes the source object in some way. So I guess that you
extract function works on $_[0] and not starting with something like:

sub extract {
my $src = shift;

....

> If this is all obvious, and I've bored you,
> I apologise.

Apology accepted.

> But it took me a good while to track down,
> and though it might be worthy of a warning
> to others as ill-informed as myself.

I don't think it would catch you concrete problem where you extract
function modifies it's arguments, but in some cases perlcritic would
have warned you about modifying $_ in list functions.

//Makholm

Re: interesting case of data corruption, and its cause

am 22.11.2007 13:47:55 von bugbear

Peter Makholm wrote:
> bugbear writes:
>
>> I wrote a subroutine, called (e.g.) "extract",
>> which takes a single src object, and returns a
>> new object.
>
> And obviously changes the source object in some way. So I guess that you
> extract function works on $_[0] and not starting with something like:
>
> sub extract {
> my $src = shift;

actually:

sub extract {
my ($src) = @_;

It's the assignment/use of the (global) $_ many miles away
(in code terms) that's (was) killing me.

BugBear

Re: interesting case of data corruption, and its cause

am 22.11.2007 16:20:06 von smallpond

On Nov 22, 7:47 am, bugbear wrote:
> Peter Makholm wrote:
> > bugbear writes:
>
> >> I wrote a subroutine, called (e.g.) "extract",
> >> which takes a single src object, and returns a
> >> new object.
>
> > And obviously changes the source object in some way. So I guess that you
> > extract function works on $_[0] and not starting with something like:
>
> > sub extract {
> > my $src = shift;
>
> actually:
>
> sub extract {
> my ($src) = @_;
>
> It's the assignment/use of the (global) $_ many miles away
> (in code terms) that's (was) killing me.
>
> BugBear


Assigning to $_ just changes the scalar global variable $_,
not the thing that it refers to. If it is a reference, and you
assign to $$_ then you could have a problem.

perl -e '$v=0; $_=\$v; $_=1; print $v;'
0
perl -e '$v=0; $_=\$v; $$_=1; print $v;'
1

perl never does automatic reference or dereference. It is
always explicit.
--S

Re: interesting case of data corruption, and its cause

am 22.11.2007 17:30:25 von Uri Guttman

>>>>> "b" == bugbear writes:

b> Peter Makholm wrote:
>> bugbear writes:
>>
>>> I wrote a subroutine, called (e.g.) "extract",
>>> which takes a single src object, and returns a
>>> new object.
>> And obviously changes the source object in some way. So I guess that
>> you
>> extract function works on $_[0] and not starting with something like:
>> sub extract {
>> my $src = shift;

b> actually:

b> sub extract {
b> my ($src) = @_;

b> It's the assignment/use of the (global) $_ many miles away
b> (in code terms) that's (was) killing me.

append this to that code to protect $_:

local( $_ ) ;

you now have copied $_ into $src and have a dynamic new $_ for the code
to be called.

this is also another reason not to use $_ unless you have to (map/grep,
etc.). in your i/o loops use lexical vars (never while() as it sets
$_).

while( my $line = ) {

or use file::slurp if the file size isn't humongous (which is < a few meg
these days).

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: interesting case of data corruption, and its cause

am 22.11.2007 20:16:18 von xhoster

Peter Makholm wrote:
> bugbear writes:
>
> > I wrote a subroutine, called (e.g.) "extract",
> > which takes a single src object, and returns a
> > new object.
>
> And obviously changes the source object in some way. So I guess that you
> extract function works on $_[0] and not starting with something like:
>
> sub extract {
> my $src = shift;


That doesn't protect $_, that protects $_[0].

perl -le 'my @x=1..10; map {foo($_)} @x; sub foo {$_="a" if $_[0]==5}; \
print "@x"'
1 2 3 4 a 6 7 8 9 10

No assignment to $_[0], yet still the thing was changed.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Re: interesting case of data corruption, and its cause

am 22.11.2007 20:24:52 von rvtol+news

smallpond schreef:

> Assigning to $_ just changes the scalar global variable $_,
> not the thing that it refers to.

perl -wle '
@x = 1..3;
$_ = 42 for @x;
print for @x
'
42
42
42

--
Affijn, Ruud

"Gewoon is een tijger."

Re: interesting case of data corruption, and its cause

am 22.11.2007 21:41:08 von rvtol+news

Dr.Ruud schreef:
> smallpond:

>> Assigning to $_ just changes the scalar global variable $_,
>> not the thing that it refers to.
>
> perl -wle '
> @x = 1..3;
> $_ = 42 for @x;
> print for @x
> '
> 42
> 42
> 42

Or maybe clearer:

[root@peat attachments]# perl -wle'
$_ = "foo";
print;

@x = 1..3;
print for @x;

$_ = 42 for @x;
print for @x;

print;
'
foo
1
2
3
42
42
42
foo

--
Affijn, Ruud

"Gewoon is een tijger."

Re: interesting case of data corruption, and its cause

am 23.11.2007 01:04:38 von Michele Dondi

On Thu, 22 Nov 2007 16:30:25 GMT, Uri Guttman
wrote:

>this is also another reason not to use $_ unless you have to (map/grep,
>etc.). in your i/o loops use lexical vars (never while() as it sets
>$_).

5.10 is behind the corner...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Re: interesting case of data corruption, and its cause

am 23.11.2007 04:37:53 von Uri Guttman

>>>>> "MD" == Michele Dondi writes:

MD> On Thu, 22 Nov 2007 16:30:25 GMT, Uri Guttman
MD> wrote:

>> this is also another reason not to use $_ unless you have to (map/grep,
>> etc.). in your i/o loops use lexical vars (never while() as it sets
>> $_).

MD> 5.10 is behind the corner...

even so, named variables are better for code maintaining than using $_ a
lot. i am not against using $_ but it needs to be used where it is an
advantage. one liners, map/grep and such. i have plans to add edit_file
and edit_file_lines subs to file::slurp and those will call a code ref
with text in $_ as shorter code there is better. using $_ in a basic
line loop is not a good use IMNSHO.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: interesting case of data corruption, and its cause

am 23.11.2007 11:34:48 von bugbear

Uri Guttman wrote:
>>>>>> "b" == bugbear writes:
>
> b> Peter Makholm wrote:
> >> bugbear writes:
> >>
> >>> I wrote a subroutine, called (e.g.) "extract",
> >>> which takes a single src object, and returns a
> >>> new object.
> >> And obviously changes the source object in some way. So I guess that
> >> you
> >> extract function works on $_[0] and not starting with something like:
> >> sub extract {
> >> my $src = shift;
>
> b> actually:
>
> b> sub extract {
> b> my ($src) = @_;
>
> b> It's the assignment/use of the (global) $_ many miles away
> b> (in code terms) that's (was) killing me.
>
> append this to that code to protect $_:
>
> local( $_ ) ;
>
> you now have copied $_ into $src and have a dynamic new $_ for the code
> to be called.
>
> this is also another reason not to use $_ unless you have to (map/grep,
> etc.). in your i/o loops use lexical vars (never while() as it sets
> $_).
>
> while( my $line = ) {
>
> or use file::slurp if the file size isn't humongous (which is < a few meg
> these days).

Oh, all agreed. I now understand (and have done) the localisation of $_.

My point is that this syndrome is fairly easy to fall into,
and doesn't appear to be widely known or spoken of.

(this is a good write up)
http://perl.plover.com/local.html

search down for "Although you may not think about it much"

BugBear

Re: interesting case of data corruption, and its cause

am 24.11.2007 00:30:31 von smallpond

On Nov 22, 3:41 pm, "Dr.Ruud" wrote:
> Dr.Ruud schreef:
>
> > smallpond:
> >> Assigning to $_ just changes the scalar global variable $_,
> >> not the thing that it refers to.
>
> > perl -wle '
> > @x = 1..3;
> > $_ = 42 for @x;
> > print for @x
> > '
> > 42
> > 42
> > 42
>
> Or maybe clearer:
>
> [root@peat attachments]# perl -wle'
> $_ = "foo";
> print;
>
> @x = 1..3;
> print for @x;
>
> $_ = 42 for @x;
> print for @x;
>
> print;
> '
> foo
> 1
> 2
> 3
> 42
> 42
> 42
> foo
>
> --
> Affijn, Ruud
>
> "Gewoon is een tijger."

You are correct, but nothing special about $_
in that case. Inside a foreach, any assignment
to the loop variable changes the list element.
So instead of:
$_ = 42 for @x;
This would have the same effect:
for $j (@x) { $j = 42; }

Re: interesting case of data corruption, and its cause

am 24.11.2007 01:28:01 von Ben Morrow

Quoth smallpond :
> On Nov 22, 3:41 pm, "Dr.Ruud" wrote:
> > Dr.Ruud schreef:
> >
> > > smallpond:
> > >> Assigning to $_ just changes the scalar global variable $_,
> > >> not the thing that it refers to.
> >
as well>
>
> You are correct, but nothing special about $_
> in that case. Inside a foreach, any assignment
> to the loop variable changes the list element.
> So instead of:
> $_ = 42 for @x;
> This would have the same effect:
> for $j (@x) { $j = 42; }

Yes, of course. The point here is that the op had

map { extract(...) } ...;

and then, somewhere random down the call chain from extract,

while (<$FH>) { ... }

without localising $_. Since $_ is *global*, assigning to it anywhere
without localising it first can have unexpected side-effects elsewhere.

The bug was not in the map, but in the code that assigned to $_ (albeit
implicitly with while (<>)) without localising. As Uri pointed out,
using that construction is almost always a mistake.

Attempting to fix the problem by localising $_ is tricky, as well,
because C doesn't always work due to some very boring bugs in
some versions of perl when $_ is an element of a tied array. The
workaround (local *_) doesn't buy you much, because you lose your sub
arguments. About the only time I use while (<>) is at the very top level
with -n, when it can't do any harm.

This problem only occurs with while loops. for loops implicitly localise
the loop variable if it isn't a fresh lexical, so don't end up stomping
on anything.

Ben

Re: interesting case of data corruption, and its cause

am 27.11.2007 13:37:53 von bugbear

Ben Morrow wrote:
> Attempting to fix the problem by localising $_ is tricky, as well,
> because C doesn't always work due to some very boring bugs in
> some versions of perl when $_ is an element of a tied array. The
> workaround (local *_) doesn't buy you much, because you lose your sub
> arguments. About the only time I use while (<>) is at the very top level
> with -n, when it can't do any harm.

Thanks to all for the discussion.

Do I take it "best practice" is (simply) not to
assign to $_, either explicitly (rare) or
implicitly (while(<>) or while()) ?

BugBear

Re: interesting case of data corruption, and its cause

am 27.11.2007 19:45:14 von Martijn Lievaart

On Tue, 27 Nov 2007 12:37:53 +0000, bugbear wrote:

> Ben Morrow wrote:
>> Attempting to fix the problem by localising $_ is tricky, as well,
>> because C doesn't always work due to some very boring bugs in
>> some versions of perl when $_ is an element of a tied array. The
>> workaround (local *_) doesn't buy you much, because you lose your sub
>> arguments. About the only time I use while (<>) is at the very top
>> level with -n, when it can't do any harm.
>
> Thanks to all for the discussion.
>
> Do I take it "best practice" is (simply) not to assign to $_, either
> explicitly (rare) or implicitly (while(<>) or while()) ?

Well, yes and no. As with all things, it depends. F.i. I find the
following perfectly acceptable:

SWITCH: {
for ($x) {
/something/ and do {
...
last SWITCH;
};
...
};

But YMMV. In general, one should avoid $_, unless there is a clear
advantage.

HTH,
M4

Re: interesting case of data corruption, and its cause

am 27.11.2007 23:25:48 von jl_post

On Nov 23, 3:34 am, bugbear wrote:
>
> Oh, all agreed. I now understand (and have done) the localisation of $_.
>
> My point is that this syndrome is fairly easy to fall into,
> and doesn't appear to be widely known or spoken of.


Heh... I had a similar problem once with a module I wrote. I
created a module that would output an image in .pbm (P4) format by
printing the output with print() statements.

The odd thing was, when I wrote a one-liner Perl script that used
my module (with the -Mmodule switch), sometimes everything worked
well, and other times extra bytes seemed to mysteriously show up in
the output file, corrupting the file.

After experimenting around, I realized that the corruption only
happened when I used the -l switch (which both chomps input and prints
a newline on output). The fact that I wrote a one-liner with the -l
command was making the print() statements in my module print out an
extra byte (a newline character) per print() statement that I hadn't
counted on.

To fix this problem, I had to local()ize the $\ variable inside my
module by adding the following line before my module's print()
statements:

local $\; # don't automatically print newlines
# (even with the '-l' switch)

Like your issue, mine is also one that doesn't appear to be widely
known or spoken of.

Just thought I would share.

-- Jean-Luc

Re: interesting case of data corruption, and its cause

am 28.11.2007 00:55:30 von Ben Morrow

Quoth "jl_post@hotmail.com" :
> On Nov 23, 3:34 am, bugbear wrote:
> >
> > Oh, all agreed. I now understand (and have done) the localisation of $_.
> >
> > My point is that this syndrome is fairly easy to fall into,
> > and doesn't appear to be widely known or spoken of.
>

>
> Like your issue, mine is also one that doesn't appear to be widely
> known or spoken of.

From perldoc perlvar:

You should be very careful when modifying the default values of
most special variables described in this document. In most cases
you want to localize these variables before changing them, since
if you don’t, the change may affect other modules which rely on
the default values of the special variables that you have
changed. This is one of the correct ways to read the whole file
at once:

open my $fh, "foo" or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
close $fh;

Any time you use or change the value of a global, you need to think if
it needs to be localized.

Ben