Sinister variable caching problem with rand()

Sinister variable caching problem with rand()

am 15.05.2010 22:03:50 von Anthony Esposito

--001636e1fb60830f430486a77d0e
Content-Type: text/plain; charset=ISO-8859-1

In one of my programs I started to receive database errors for not having a
unique id. I generate unique ids for each of the mysql lines that I add to
the database. I realized that the perl variable $idNum was keeping the same
random string for multiple executions.

I created a test program to demonstrate the principle of what is happening.

I have shown that the $count variable will function properly, I understand
why and expect $idNum to work the same way. However, when $idNum =
rand(99999999999) is used it misbehaves and starts to show the same numbers
after enough refreshes.

The code stores the random numbers in a file so I can detect them over
multiple sessions. Within the same session a random key has never been
generated twice.

Please help, I have been working on this for 2 days and it is killing my
progress at work.

------------------------------------------------------------ -------------

use strict;

my $idNum=0;
my $count=0;
for(1..10){
my %hash;
open(data," while(){
chomp;
$hash{$_}=$_;
}
close(data);

$idNum = rand(9999999999999);
$count++;
open(data2,">>keys.txt");
print data2 "$idNum\n";
close(data2);

if ($hash{$idNum}){ print "DUPLICATE - $idNum
\n";}else{ print "UNIQUE
$idNum
\n"; }

}

print "$count";

--001636e1fb60830f430486a77d0e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

In one of my programs I started to receive database errors for not having a=
unique id. I generate unique ids for each of the mysql lines that I add to=
the database. I realized that the perl variable $idNum was keeping the sam=
e random string for multiple executions.


I created a test program to demonstrate the principle of what is happen=
ing.

I have shown that the $count variable will function properly, I=
understand why and expect $idNum to work the same way. However, when $idNu=
m =3D rand(99999999999) is used it misbehaves and starts to show the same n=
umbers after enough refreshes.


The code stores the random numbers in a file so I can detect them over =
multiple sessions. Within the same session a random key has never been gene=
rated twice.

Please help, I have been working on this for 2 days and=
it is killing my progress at work.


------------------------------------------------------------ -----------=
--

use strict;

my $idNum=3D0;
my $count=3D0;
for(1..10)=
{
my %hash;
open(data,"<keys.txt");
while(<data>=
;){

chomp;
$hash{$_}=3D$_;
}
close(data);

$idNum =3D rand(99999=
99999999);
$count++;
open(data2,">>keys.txt");
pri=
nt data2 "$idNum\n";
close(data2);

if ($hash{$idNum}){ =
print "DUPLICATE - $idNum<br>\n";}else{ print "UNIQUE =
$idNum<br>\n"; }


}

print "$count";



--001636e1fb60830f430486a77d0e--

Re: Sinister variable caching problem with rand()

am 15.05.2010 22:13:03 von Perrin Harkins

On Sat, May 15, 2010 at 4:03 PM, Anthony Esposito
wrote:
> In one of my programs I started to receive database errors for not having a
> unique id. I generate unique ids for each of the mysql lines that I add to
> the database. I realized that the perl variable $idNum was keeping the same
> random string for multiple executions.

You need to call srand() in s a child init handler:
http://marc.info/?l=apache-modperl&m=123904225030744&w=1

However, I have to ask, why are you generating id numbers randomly?
Why not just let mysql do it with auto_increment?

- Perrin

Re: Sinister variable caching problem with rand()

am 15.05.2010 22:29:03 von Adam Prime

Perrin Harkins wrote:
> On Sat, May 15, 2010 at 4:03 PM, Anthony Esposito
> wrote:
>> In one of my programs I started to receive database errors for not having a
>> unique id. I generate unique ids for each of the mysql lines that I add to
>> the database. I realized that the perl variable $idNum was keeping the same
>> random string for multiple executions.
>
> You need to call srand() in s a child init handler:
> http://marc.info/?l=apache-modperl&m=123904225030744&w=1
>
> However, I have to ask, why are you generating id numbers randomly?
> Why not just let mysql do it with auto_increment?
>
> - Perrin

I think, in theory you shouldn't have to explicitly srand yourself in
your startup.pl if you're using mod_perl 1.29 or greater, and perl 5.8.1
or greater.

see:

http://marc.info/?l=apache-modperl-dev&m=106606815110220&w=2

Adam

Re: Sinister variable caching problem with rand()

am 15.05.2010 22:34:18 von Cosimo Streppone

On Sat, 15 May 2010 22:03:50 +0200, Anthony Esposito
wrote:

> In one of my programs I started to receive database errors for not
> having a unique id. I generate unique ids for each of the mysql lines
> that I add to the database. I realized that the perl variable $idNum was
> keeping the same random string for multiple executions.

I've been there too.
Same problem, with mod perl 2.04.

This is what I have in my startup.pl:

# Every mp process needs a different random seed, or they will
# generate the exact same random number sequence every time,
# resulting in tokens collision
my $srv = Apache2::ServerUtil->server;
$srv->push_handlers(PerlChildInitHandler => sub { srand(time ^ $$) } );

--
Cosimo

Re: Sinister variable caching problem with rand()

am 15.05.2010 22:47:14 von aw

Adam Prime wrote:
> Perrin Harkins wrote:
>> On Sat, May 15, 2010 at 4:03 PM, Anthony Esposito
>> wrote:
>>> In one of my programs I started to receive database errors for not
>>> having a
>>> unique id. I generate unique ids for each of the mysql lines that I
>>> add to
>>> the database. I realized that the perl variable $idNum was keeping
>>> the same
>>> random string for multiple executions.
>>
>> You need to call srand() in s a child init handler:
>> http://marc.info/?l=apache-modperl&m=123904225030744&w=1
>>
>> However, I have to ask, why are you generating id numbers randomly?
>> Why not just let mysql do it with auto_increment?
>>
>> - Perrin
>
> I think, in theory you shouldn't have to explicitly srand yourself in
> your startup.pl if you're using mod_perl 1.29 or greater, and perl 5.8.1
> or greater.
>
> see:
>
> http://marc.info/?l=apache-modperl-dev&m=106606815110220&w=2
>
Also see : perldoc -f srand
and
http://en.wikipedia.org/wiki/Murphy%27s_law

With the second, I mean that even if the rand() sequence is different in
different Apache children and cgi-bin's or modules called by these
children, that still does not mean that the same number could not be
generated occasionally. Especially if this over a long period of time,
and the result is stored in a database.
So I would either use what Perrin mentioned, or combine the result of (a
shorter) rand() call, with time() for example.

A tip : to get a really unique identifier, I often use
"yyyymmddhhmmssrrrrr", where rrrrr is the rand() result, and the prefix
is the date/time. Unless you process more than 99,999 requests per
second, it will always be unique. In addition, such an identifier is at
the same time a time stamp, which is often invaluable in tracking down a
problem.

Re: Sinister variable caching problem with rand()

am 16.05.2010 00:02:29 von mpeters

On 05/15/2010 04:47 PM, André Warnier wrote:
>
> A tip : to get a really unique identifier

Another tip: to get a really unique identifier use mod_unique_id or
Data::UUID or the UUID() function in mysql.

--
Michael Peters
Plus Three, LP

Re: Sinister variable caching problem with rand()

am 16.05.2010 03:48:27 von Anthony Esposito

--001636e0a649f877320486ac4d73
Content-Type: text/plain; charset=ISO-8859-1

Thank you everyone for helping me, I am surprised by the support our there
for mod_perl.

Cosimo sent me a quick fix to add to the startup.pl file, I have tested and
verified it. The code below works with no problems after adding:

my $srv = Apache2::ServerUtil->server;
$srv->push_handlers(
PerlChildInitHandler => sub { srand(time ^ $$) } );

This fix not only corrected the rand() example I posted, but fixed my root
issue that rand() represented in my test script. My actual script uses:

my $id = new String::Random;
my $num = $id->randregex('[A-Za-z0-9]{30}');

This code is in a sub in another library. For the sake of keeping it simple
I removed the use of it for my example. Luckily, String::Random must have
suffered from the same rand() problem : )

And I HIGHLY appreciate Michael Peters tip (UUID() function in mysql), I
will most likely start using that in place of my own random keys to reduce
my overhead and worries.

The reason I am not using auto_increment is because the databases exist on a
mysql cluster. The auto_increment counts would have to be maintained very
carefully with multiple servers running the same database so I opted to not
worry about it and generate completely unique keys. When a line is added in
one database it is advertised to all others, the linking between lines in
diff tables is easier this way too.

Thanks a ton gentlemen, you have saved me a great deal of time.

Anthony Esposito

On Sat, May 15, 2010 at 1:03 PM, Anthony Esposito > wrote:

> In one of my programs I started to receive database errors for not having a
> unique id. I generate unique ids for each of the mysql lines that I add to
> the database. I realized that the perl variable $idNum was keeping the same
> random string for multiple executions.
>
> I created a test program to demonstrate the principle of what is happening.
>
> I have shown that the $count variable will function properly, I understand
> why and expect $idNum to work the same way. However, when $idNum =
> rand(99999999999) is used it misbehaves and starts to show the same numbers
> after enough refreshes.
>
> The code stores the random numbers in a file so I can detect them over
> multiple sessions. Within the same session a random key has never been
> generated twice.
>
> Please help, I have been working on this for 2 days and it is killing my
> progress at work.
>
> ------------------------------------------------------------ -------------
>
> use strict;
>
> my $idNum=0;
> my $count=0;
> for(1..10){
> my %hash;
> open(data," > while(){
> chomp;
> $hash{$_}=$_;
> }
> close(data);
>
> $idNum = rand(9999999999999);
> $count++;
> open(data2,">>keys.txt");
> print data2 "$idNum\n";
> close(data2);
>
> if ($hash{$idNum}){ print "DUPLICATE - $idNum
\n";}else{ print "UNIQUE
> $idNum
\n"; }
>
> }
>
> print "$count";
>
>

--001636e0a649f877320486ac4d73
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thank you everyone for helping me, I am surprised by the support our there =
for mod_perl.

Cosimo sent me a quick fix to add to the tp://startup.pl">startup.pl file, I have tested and verified it. The co=
de below works with no problems after adding:


=A0my $srv =3D Apache2::ServerUtil->server;

=A0$srv->push_handlers(

PerlChildInitHan=
dler =3D> sub {=20
srand(time ^ $$) } );

This fix not only corrected the rand() example=
I posted, but fixed my root issue that rand() represented in my test scrip=
t. My actual script uses:

my $id =3D new String::Random;
my $num =
=3D $id->randregex('[A-Za-z0-9]{30}');


This code is in a sub in another library. For the sake of keeping it si=
mple I removed the use of it for my example. Luckily, String::Random must h=
ave suffered from the same rand() problem : )

And I HIGHLY appreciat=
e Michael Peters tip (UUID() function in mysql), I will most likely start u=
sing that in place of my own random keys to reduce my overhead and worries.=



The reason I am not using auto_increment is because the databases exist=
on a mysql cluster. The auto_increment counts would have to be maintained =
very carefully with multiple servers running the same database so I opted t=
o not worry about it and generate completely unique keys. When a line is ad=
ded in one database it is advertised to all others, the linking between lin=
es in diff tables is easier this way too.


Thanks a ton gentlemen, you have saved me a great deal of time. r>
Anthony Esposito

On Sat, May 15, 20=
10 at 1:03 PM, Anthony Esposito < y.m.esposito@gmail.com">tony.m.esposito@gmail.com> wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">In one of my prog=
rams I started to receive database errors for not having a unique id. I gen=
erate unique ids for each of the mysql lines that I add to the database. I =
realized that the perl variable $idNum was keeping the same random string f=
or multiple executions.



I created a test program to demonstrate the principle of what is happen=
ing.

I have shown that the $count variable will function properly, I=
understand why and expect $idNum to work the same way. However, when $idNu=
m =3D rand(99999999999) is used it misbehaves and starts to show the same n=
umbers after enough refreshes.



The code stores the random numbers in a file so I can detect them over =
multiple sessions. Within the same session a random key has never been gene=
rated twice.

Please help, I have been working on this for 2 days and=
it is killing my progress at work.



------------------------------------------------------------ -----------=
--

use strict;

my $idNum=3D0;
my $count=3D0;
for(1..10)=
{
my %hash;
open(data,"<keys.txt");
while(<data>=
;){


chomp;
$hash{$_}=3D$_;
}
close(data);

$idNum =3D rand(99999=
99999999);
$count++;
open(data2,">>keys.txt");
pri=
nt data2 "$idNum\n";
close(data2);

if ($hash{$idNum}){ =
print "DUPLICATE - $idNum<br>\n";}else{ print "UNIQUE =
$idNum<br>\n"; }



}

print "$count";





--001636e0a649f877320486ac4d73--

Re: Sinister variable caching problem with rand()

am 16.05.2010 04:31:43 von Anthony Esposito

--00504502c5f0bc257b0486ace871
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

PS.

I had to strengthen the srand() function I added to the startup.pl that
Cosimo suggested. I used the example from the perldoc and have not had any
duplicate keys so far. I was having upwards of 3 per 1000 when using
srand(time ^ $$)

Fix added to startup.pl file
---------------------------------------

my $srv =3D Apache2::ServerUtil->server;
$srv->push_handlers(PerlChildInitHandler =3D> sub { srand(time ^ $$ ^ unpac=
k
"%L*", `ps axww | gzip -f`) } );

---------------------------------------

On Sat, May 15, 2010 at 3:02 PM, Michael Peters wrot=
e:

> On 05/15/2010 04:47 PM, Andr=E9 Warnier wrote:
>
>>
>> A tip : to get a really unique identifier
>>
>
> Another tip: to get a really unique identifier use mod_unique_id or
> Data::UUID or the UUID() function in mysql.
>
> --
> Michael Peters
> Plus Three, LP
>

--00504502c5f0bc257b0486ace871
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

PS.

I had to strengthen the srand() function I added to the =3D"http://startup.pl">startup.pl that Cosimo suggested. I used the exa=
mple from the perldoc and have not had any duplicate keys so far. I was hav=
ing upwards of 3 per 1000 when using srand(time ^ $$)


Fix added to file
-----=
----------------------------------

my $srv =3D Apache2::ServerUtil-&=
gt;server;
$srv->push_handlers(PerlChildInitHandler =3D> sub { sra=
nd(time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`) } );


---------------------------------------

e">On Sat, May 15, 2010 at 3:02 PM, Michael Peters < href=3D"mailto:mpeters@plusthree.com">mpeters@plusthree.com>=
wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
>On 05/15/2010 04:47 PM, Andr=E9 Warnier wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


A tip : to get a really unique identifier




Another tip: to get a really unique identifier use mod_unique_id or Data::U=
UID or the UUID() function in mysql.



--

Michael Peters

Plus Three, LP




--00504502c5f0bc257b0486ace871--

Re: Sinister variable caching problem with rand()

am 16.05.2010 11:28:46 von Michael Ludwig

André Warnier schrieb am 15.05.2010 um 22:47:14 (+0200):

> A tip : to get a really unique identifier, I often use
> "yyyymmddhhmmssrrrrr", where rrrrr is the rand() result,
> and the prefix is the date/time. Unless you process more
> than 99,999 requests per second, it will always be unique.

I'm probably missing something trivial, but how do you
enforce uniqueness over rrrrr for 99,999 consecutive calls
to rand?

The perldoc doesn't promise any uniqueness, only randomness,
which isn't uniqueness.

--
Michael Ludwig

Re: Sinister variable caching problem with rand()

am 16.05.2010 13:40:31 von aw

Michael Ludwig wrote:
> André Warnier schrieb am 15.05.2010 um 22:47:14 (+0200):
>
>> A tip : to get a really unique identifier, I often use
>> "yyyymmddhhmmssrrrrr", where rrrrr is the rand() result,
>> and the prefix is the date/time. Unless you process more
>> than 99,999 requests per second, it will always be unique.
>
> I'm probably missing something trivial, but how do you
> enforce uniqueness over rrrrr for 99,999 consecutive calls
> to rand?
>
> The perldoc doesn't promise any uniqueness, only randomness,
> which isn't uniqueness.
>
You are right, I should not have said "always". It is still not
guaranteed, and does not "enforce" uniqueness.
But yyyymmddhhmmssrrrrr means that you would need a duplicate within the
same second, which is quite a bit less likely than over a much longer
period of time.
I have never done the math to calculate exactly how likely it was that I
would get 2 identical identifiers for any given number of
requests/second. Nor what adding one rand() digit would do to this
probability.
Let's say this : I have a dozen websites where the above is being used.
None of them even approaches 99,999 requests/second, and none of them
necessarily generates a unique-id at each request.
But the setup is such that, should a duplicate identifier be generated,
the application stops with an error. I did it so because I was curious,
and because my sites are not Google.
Over a period of about 5 years, it has never happened.
Probabilities being what they are, it does not mean that it cannot be
happening at this very second. But then also, the Sun may have turned
into a supernova 6 minutes ago.

One alternative is to use some strictly incremental counter, shared
between multiple processes running on potentially multiple systems or
CPUs. This requires a common place to store the counter, which survives
a system restart, and it requires some lock-read-increment-unlock
mechanism. I don't know any really fast and efficient way of doing
this. I am interested however if anyone knows one.

Re: Sinister variable caching problem with rand()

am 16.05.2010 15:23:11 von Michael Ludwig

André Warnier schrieb am 16.05.2010 um 13:40:31 (+0200):

> One alternative is to use some strictly incremental
> counter, shared between multiple processes running on
> potentially multiple systems or CPUs. This requires a
> common place to store the counter, which survives a system
> restart, and it requires some lock-read-increment-unlock
> mechanism. I don't know any really fast and efficient way
> of doing this. I am interested however if anyone knows
> one.

In an SQL server, you'd use a SEQUENCE:

SELECT NEXT VALUE FOR MY_BLA_SEQ FROM <...>

Read about Redis recently, like Memcache, just "semi-persistent".
You can store and increment an integer there. Or in any other
key-value store.

I think the locking required for incrementing an integer, if
necessary at all, is negligible.

But I agree that in practice for most scenarios, your solution
will work just fine ;-)
--
Michael Ludwig

Re: Sinister variable caching problem with rand()

am 16.05.2010 16:07:44 von Clinton Gormley

> In an SQL server, you'd use a SEQUENCE:
>
> SELECT NEXT VALUE FOR MY_BLA_SEQ FROM <...>
>

Here's a good read about how Flickr manage their unique IDs using MySQL:

http://code.flickr.com/blog/2010/02/08/ticket-servers-distri buted-unique-primary-keys-on-the-cheap/

clint

Re: Sinister variable caching problem with rand()

am 16.05.2010 16:36:38 von Perrin Harkins

On Sat, May 15, 2010 at 9:48 PM, Anthony Esposito
wrote:
> The reason I am not using auto_increment is because the databases exist on a
> mysql cluster. The auto_increment counts would have to be maintained very
> carefully with multiple servers running the same database so I opted to not
> worry about it and generate completely unique keys. When a line is added in
> one database it is advertised to all others, the linking between lines in
> diff tables is easier this way too.

If you haven't already, I suggest picking up a copy of "High
Performance MySQL" for suggestions on easier ways to deal with a
database cluster.

- Perrin