uniqid() and repetition of numbers generated

uniqid() and repetition of numbers generated

am 12.11.2009 23:22:47 von Angus Mann

------=_NextPart_000_00E4_01CA643A.7B781AF0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi all. I'm sure I can't be the first person to ask this question but a =
search of the net leaves me confused.

I need a unique identifier in an SQL table and for complicated reasons I =
don't want to use auto-increment.

So I thought I would use a pseudo-random method instead. I am NOT scared =
of people guessing the unique identifier, it just has to be unique in =
order for the database to work properly.

So I looked at the uniqid() function and see it is based on the "current =
time in microseconds" and when I test it out I see that it increments =
(very quickly) when run repeatedly.

If it is based on JUST the time, then it should repeat every 24 hours, =
thus making "collisions" possible, which I don't want.

If it is based on the time AND day, then that's fine....I can use it.

So here's the problem....
When I calculate the number of microseconds since 1970 I get a 16 digit =
number.
But uniqid() only gives a 13 digit number.
Calculating the number of microseconds in a day gives 11 digits.

So it seems to me that the numbering sequence will repeat every 100 =
days, which risks collisions also.

Can someone explain how uniqid() is really calculated, so I can make a =
proper judgement about how to use it?

Please don't suggest using a hash of a number generated by uniqid(). =
Hashing a small number into a longer one does not add entropy, it just =
transforms the input number, so it does NOT alter the risk of collisions =
so there is no net advantage.

I had a thought to just append the current date to the uniqid() result =
but I'm interested to know if anyone has a more elegant solution.

Thanks in advance.

Angus





------=_NextPart_000_00E4_01CA643A.7B781AF0--

Re: uniqid() and repetition of numbers generated

am 12.11.2009 23:31:21 von Ashley Sheridan

--=-Ejt3RQ06+V2EeTY3GF5/
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Fri, 2009-11-13 at 08:22 +1000, Angus Mann wrote:

> Hi all. I'm sure I can't be the first person to ask this question but a search of the net leaves me confused.
>
> I need a unique identifier in an SQL table and for complicated reasons I don't want to use auto-increment.
>
> So I thought I would use a pseudo-random method instead. I am NOT scared of people guessing the unique identifier, it just has to be unique in order for the database to work properly.
>
> So I looked at the uniqid() function and see it is based on the "current time in microseconds" and when I test it out I see that it increments (very quickly) when run repeatedly.
>
> If it is based on JUST the time, then it should repeat every 24 hours, thus making "collisions" possible, which I don't want.
>
> If it is based on the time AND day, then that's fine....I can use it.
>
> So here's the problem....
> When I calculate the number of microseconds since 1970 I get a 16 digit number.
> But uniqid() only gives a 13 digit number.
> Calculating the number of microseconds in a day gives 11 digits.
>
> So it seems to me that the numbering sequence will repeat every 100 days, which risks collisions also.
>
> Can someone explain how uniqid() is really calculated, so I can make a proper judgement about how to use it?
>
> Please don't suggest using a hash of a number generated by uniqid(). Hashing a small number into a longer one does not add entropy, it just transforms the input number, so it does NOT alter the risk of collisions so there is no net advantage.
>
> I had a thought to just append the current date to the uniqid() result but I'm interested to know if anyone has a more elegant solution.
>
> Thanks in advance.
>
> Angus
>
>
>
>


Auto increment fields are designed to avoid collisions. I can't think of
any sensible reason for not using them. If you're worried that users of
the system will think a number like '65' is a 'silly' value for an id,
why not pad it up with leading zeros, and maybe add in some text from
their name or something. To me, one unique number is the same as
another, whether it has 11 digits or 2. Also, without having numbers
with many leading zeros in your 11-digit unique number, the value range
will be dramatically reduced, thereby increasing the chance of you
running out of unique values.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-Ejt3RQ06+V2EeTY3GF5/--

Re: uniqid() and repetition of numbers generated

am 13.11.2009 00:11:52 von Angus Mann

------=_NextPart_000_010C_01CA6441.56C085B0
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

---- Original Message -----=20
From: Ashley Sheridan=20
To: Angus Mann=20
Cc: php-general@lists.php.net=20
Sent: Friday, November 13, 2009 8:31 AM
Subject: Re: [PHP] uniqid() and repetition of numbers generated


On Fri, 2009-11-13 at 08:22 +1000, Angus Mann wrote:=20
Hi all. I'm sure I can't be the first person to ask this question but a =
search of the net leaves me confused.

I need a unique identifier in an SQL table and for complicated reasons I =
don't want to use auto-increment.

So I thought I would use a pseudo-random method instead. I am NOT scared =
of people guessing the unique identifier, it just has to be unique in =
order for the database to work properly.

So I looked at the uniqid() function and see it is based on the "current =
time in microseconds" and when I test it out I see that it increments =
(very quickly) when run repeatedly.

If it is based on JUST the time, then it should repeat every 24 hours, =
thus making "collisions" possible, which I don't want.

If it is based on the time AND day, then that's fine....I can use it.

So here's the problem....
When I calculate the number of microseconds since 1970 I get a 16 digit =
number.
But uniqid() only gives a 13 digit number.
Calculating the number of microseconds in a day gives 11 digits.

So it seems to me that the numbering sequence will repeat every 100 =
days, which risks collisions also.

Can someone explain how uniqid() is really calculated, so I can make a =
proper judgement about how to use it?

Please don't suggest using a hash of a number generated by uniqid(). =
Hashing a small number into a longer one does not add entropy, it just =
transforms the input number, so it does NOT alter the risk of collisions =
so there is no net advantage.

I had a thought to just append the current date to the uniqid() result =
but I'm interested to know if anyone has a more elegant solution.

Thanks in advance.

Angus





Auto increment fields are designed to avoid collisions. I can't think =
of any sensible reason for not using them. If you're worried that users =
of the system will think a number like '65' is a 'silly' value for an =
id, why not pad it up with leading zeros, and maybe add in some text =
from their name or something. To me, one unique number is the same as =
another, whether it has 11 digits or 2. Also, without having numbers =
with many leading zeros in your 11-digit unique number, the value range =
will be dramatically reduced, thereby increasing the chance of you =
running out of unique values.

Thanks,
Ash
http://www.ashleysheridan.co.uk


=20

Thanks Ashley. To clarify, the reason I don't want to use auto-increment =
: different users with their own populated databases may wish to merge =
some or all of their data. The unique identifier needs to be carried =
along with the rest of the data, hence be unique not only on the =
database it currently resides in ... it still needs to be unique if it =
gets copied into another person's database, and auto-increment will not =
meet that requirement. I thought that using microtime (hence uniqid()) =
will solve the problem, and the only chance of a collision is the =
unlikely event that by chance, records are added to 2 different people's =
databases at EXACTLY the same time, to within an accuracy of a millionth =
of a second. Possible I realize, but very unlikely, given that each user =
will probably add less than 100 entries per day.

On balance I think I will generate an identifier consisting of a few =
things...uniqid() plus a a few letters from the person's name plus a =
(pseudo)random 3 digit number. Probably there's enough entropy in that =
for my purpose. =20

But the question still remains....what exactly is being returned by =
uniqid() ? It is obviously not random, and not a hash function because =
it increments predictably. It's too short to be the number of =
microseconds since 1970 and too long to be the number of microseconds =
since midnight. Since it has a fixed length, and it increments, it will =
eventually get to the last possible number - when will that be, and what =
will happen - will an extra digit appear or will it go back to zero, or =
will the generating algorithm crash?=20

If it's anything similar to the unix timestamp then we're all in trouble =
on January 19, 2038 !










------=_NextPart_000_010C_01CA6441.56C085B0--

Re: uniqid() and repetition of numbers generated

am 13.11.2009 00:37:44 von Ross McKay

On Fri, 13 Nov 2009 08:22:47 +1000, Angus Mann wrote:

>I need a unique identifier in an SQL table and for complicated reasons
>I don't want to use auto-increment. [...]

So why not use a UUID/GUID as created by the DB? You don't specify which
DB server technology you're using, but:

* Microsoft SQL Server has uniqueidentifier and newid()
* Oracle has sys_guid()
* MySQL has uuid() and uuid_short()
* PostgreSQL has uuid and contributor function contrib/uuid-ossp

Or are you using something else which doesn't support UUIDs?

(Ash, maybe he's doing replication and can't rely on auto-increment
integers aligning across peers)
--
Ross McKay, Toronto, NSW Australia
"Let the laddie play wi the knife - he'll learn"
- The Wee Book of Calvin

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: uniqid() and repetition of numbers generated

am 13.11.2009 03:30:28 von John Hicks

--------------000305020407040402050801
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Angus Mann wrote:
> ---- Original Message -----
> From: Ashley Sheridan
> To: Angus Mann
> Cc: php-general@lists.php.net
> Sent: Friday, November 13, 2009 8:31 AM
> Subject: Re: [PHP] uniqid() and repetition of numbers generated
>
>
> On Fri, 2009-11-13 at 08:22 +1000, Angus Mann wrote:
> Hi all. I'm sure I can't be the first person to ask this question but a search of the net leaves me confused.
>
> I need a unique identifier in an SQL table and for complicated reasons I don't want to use auto-increment.
>
> So I thought I would use a pseudo-random method instead. I am NOT scared of people guessing the unique identifier, it just has to be unique in order for the database to work properly.
>
> So I looked at the uniqid() function and see it is based on the "current time in microseconds" and when I test it out I see that it increments (very quickly) when run repeatedly.
>
> If it is based on JUST the time, then it should repeat every 24 hours, thus making "collisions" possible, which I don't want.
>
> If it is based on the time AND day, then that's fine....I can use it.
>
> So here's the problem....
> When I calculate the number of microseconds since 1970 I get a 16 digit number.
> But uniqid() only gives a 13 digit number.
> Calculating the number of microseconds in a day gives 11 digits.
>
> So it seems to me that the numbering sequence will repeat every 100 days, which risks collisions also.
>
> Can someone explain how uniqid() is really calculated, so I can make a proper judgement about how to use it?
>
> Please don't suggest using a hash of a number generated by uniqid(). Hashing a small number into a longer one does not add entropy, it just transforms the input number, so it does NOT alter the risk of collisions so there is no net advantage.
>
> I had a thought to just append the current date to the uniqid() result but I'm interested to know if anyone has a more elegant solution.
>
> Thanks in advance.
>
> Angus
>
>
>
>
>
> Auto increment fields are designed to avoid collisions. I can't think of any sensible reason for not using them. If you're worried that users of the system will think a number like '65' is a 'silly' value for an id, why not pad it up with leading zeros, and maybe add in some text from their name or something. To me, one unique number is the same as another, whether it has 11 digits or 2. Also, without having numbers with many leading zeros in your 11-digit unique number, the value range will be dramatically reduced, thereby increasing the chance of you running out of unique values.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>
>
> Thanks Ashley. To clarify, the reason I don't want to use auto-increment : different users with their own populated databases may wish to merge some or all of their data. The unique identifier needs to be carried along with the rest of the data, hence be unique not only on the database it currently resides in ... it still needs to be unique if it gets copied into another person's database, and auto-increment will not meet that requirement. I thought that using microtime (hence uniqid()) will solve the problem, and the only chance of a collision is the unlikely event that by chance, records are added to 2 different people's databases at EXACTLY the same time, to within an accuracy of a millionth of a second. Possible I realize, but very unlikely, given that each user will probably add les
s than 100 entries per day.
>
> On balance I think I will generate an identifier consisting of a few things...uniqid() plus a a few letters from the person's name plus a (pseudo)random 3 digit number. Probably there's enough entropy in that for my purpose.
>
> But the question still remains....what exactly is being returned by uniqid() ? It is obviously not random, and not a hash function because it increments predictably. It's too short to be the number of microseconds since 1970 and too long to be the number of microseconds since midnight. Since it has a fixed length, and it increments, it will eventually get to the last possible number - when will that be, and what will happen - will an extra digit appear or will it go back to zero, or will the generating algorithm crash?
>
> If it's anything similar to the unix timestamp then we're all in trouble on January 19, 2038 !
>

Here's part of the confusion:

If you were to express the number of microseconds since 1970 in a
decimal number, it would indeed take 16 digits.

But uniqid() returns a /13 character string/, not a 13 digit number. The
string is actually a hexadecimal number (and thus can express a greater
range of values than a decimal number within those 13 characters).

-John



--------------000305020407040402050801--

Re: uniqid() and repetition of numbers generated

am 13.11.2009 05:30:40 von Angus Mann

> Here's part of the confusion:
>
> If you were to express the number of microseconds since 1970 in a
> decimal number, it would indeed take 16 digits.
>
> But uniqid() returns a /13 character string/, not a 13 digit number. The
> string is actually a hexadecimal number (and thus can express a greater
> range of values than a decimal number within those 13 characters).
>
> -John

Ahh! The moment when the penny drops. I was looking at the result as a
number, not a string.
Solution to problem = use uniqid()



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: uniqid() and repetition of numbers generated

am 13.11.2009 12:10:12 von Ashley Sheridan

--=-WrwI0w/4tKtmL24lyYSQ
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Fri, 2009-11-13 at 14:30 +1000, Angus Mann wrote:

> > Here's part of the confusion:
> >
> > If you were to express the number of microseconds since 1970 in a
> > decimal number, it would indeed take 16 digits.
> >
> > But uniqid() returns a /13 character string/, not a 13 digit number. The
> > string is actually a hexadecimal number (and thus can express a greater
> > range of values than a decimal number within those 13 characters).
> >
> > -John
>
> Ahh! The moment when the penny drops. I was looking at the result as a
> number, not a string.
> Solution to problem = use uniqid()
>
>
>

You say you don't want to use the auto increment field for this 'unique'
value, but then you are considering using something which is likely to
produce the exact same value as someone else may have in their database,
and then you plan on merging the data!

If I were you I'd either:

a) find another way instead t create a number that is unique to you,
something like a namespace kind of thing
b) don't merge the data directly, but think of a smart way of adding it
that avoids collisions

Personally, I reckon it's best going with the latter. The other database
should have extra fields in: one for their own auto incremental field,
one to store the value of your auto increment (not unique to avoid
problems if they source data from many other exterior databases) and one
field to indicate where the data came from.

I really do think that a bit of work on the interface between your two
databases would reap dividends, as it could make it a whole lot easier
to determine where a problem came from if one occurs, and to my mind
makes things just a little bit easier in the long run.


Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-WrwI0w/4tKtmL24lyYSQ--

Re: uniqid() and repetition of numbers generated

am 13.11.2009 15:46:53 von TedD

At 9:11 AM +1000 11/13/09, Angus Mann wrote:
>Thanks Ashley. To clarify, the reason I don't want to use
>auto-increment : different users with their own populated databases
>may wish to merge some or all of their data. The unique identifier
>needs to be carried along with the rest of the data, hence be unique
>not only on the database it currently resides in ... it still needs
>to be unique if it gets copied into another person's database, and
>auto-increment will not meet that requirement.


What you are running into can be solved with proper application of
namespace and auto-increment. It's a problem that's been solved.

If you want something unique per database, then use the domain name
where the database resides.

That way when you merge two database together, they field could have
their on internal auto-increment that is guaranteed to be unique and
a field showing their domain name, which is also unique.

If using a domain name (i.e., string) doesn't fit with your design,
then where you merge databases together then have an assignment of
numbers to each domain in the parent database. For example;

example.com = 1
nowhere.com = 2
hereitis.com = 3
....

Then in your merged database you would have the following two fields

domain number | auto-increment number (specific for that domain)
231 | 459879
356 | 109374556
140847 | 456729

That way you can have a nearly unlimited number of databases each
contributing a nearly unlimited number of records. What more could
you want?

Remember, the IP's are 255 x 255 x 255 x 255 < that's a pretty big number.

If that doesn't solve your problem, then use the IPng, which is

4 billion x 4 billion x 4 billion x 4 billion.

That's a number 36 digits long.

To give you an idea of how large that number is, imagine the surface
of this planet divided into atoms and each atom having a billion
addresses assigned to it.

So, simplify your problem by using namespace and auto-increment -- it works.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: uniqid() and repetition of numbers generated

am 15.11.2009 05:07:16 von geek.de

--001636b145135516800478610752
Content-Type: text/plain; charset=UTF-8

2009/11/14 tedd

> A
>

Interesting thought. My idea on this is to use the approach used when
replicating a DB. It is similar to the namespace idea if not the same:

Say you have 3 databases, you could use mod 3 numbers for A=0, B=1 and C=2
So on A you would have 0, 3, 6, 9, ... on B 1, 4, 7, 10, ... and on C 2, 5,
8, 11. This way you can just use auto increment and set the increment value
3 and the start value to 0,1,2 respectively. Also, this way you will not run
out of numbers until you run out of integers.

++Tim Hinnerk Heuer++

http://www.ihostnz.com

--001636b145135516800478610752--