using UID in DB

using UID in DB

am 30.03.2010 22:36:27 von Tommy Pham

Hi,

I'm just wondering if anyone on this list using some type of
UID/UUID/GUID in any of the DB? If so, what DBMS/RDBMS are you using
and how many rows do you have for the table(s) using it? What data
type are you using for that column?

TIA,
Tommy

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 30.03.2010 23:04:02 von Paul M Foster

On Tue, Mar 30, 2010 at 01:36:27PM -0700, Tommy Pham wrote:

> Hi,
>
> I'm just wondering if anyone on this list using some type of
> UID/UUID/GUID in any of the DB? If so, what DBMS/RDBMS are you using
> and how many rows do you have for the table(s) using it? What data
> type are you using for that column?

If I understand you correctly, I use a single table, "users". Either
MySQL or PostgreSQL (depending on the application). There is one record
per user, and that record contains a serial/sequential ID, set by the
system, a user ID which is varchar(8), email address which is
varchar(255), username which is varchar(50) and a password which is
varchar(32) and stored encrypted.

There are other related tables I use to record which URLs require
security and which users have access to those URLs.

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 30.03.2010 23:18:53 von Tommy Pham

On Tue, Mar 30, 2010 at 2:04 PM, Paul M Foster wr=
ote:
> On Tue, Mar 30, 2010 at 01:36:27PM -0700, Tommy Pham wrote:
>
>> Hi,
>>
>> I'm just wondering if anyone on this list using some type of
>> UID/UUID/GUID in any of the DB?  If so, what DBMS/RDBMS are you usi=
ng
>> and how many rows do you have for the table(s) using it?  What data
>> type are you using for that column?
>
> If I understand you correctly, I use a single table, "users". Either
> MySQL or PostgreSQL (depending on the application). There is one record
> per user, and that record contains a serial/sequential ID, set by the
> system, a user ID which is varchar(8), email address which is
> varchar(255), username which is varchar(50) and a password which is
> varchar(32) and stored encrypted.
>

Hi Paul,

In the case of mysql, it would be UUID and the value would look like this:

22ea1df1-3c40-11df-ab7a-200cd91e08cf

and the case of postgresql,

A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11

which is 36 CHAR length (including dashes not braces) and is not quite
like identity insert (autoincrement). You could store it as
binary(16) - in mysql - but you'll need to implement UDFs to convert
between binary & char. Is that what you're using? or Are you using an
INT type?

Regards,
Tommy

> There are other related tables I use to record which URLs require
> security and which users have access to those URLs.
>
> Paul
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 30.03.2010 23:27:02 von Nathan Rixham

Tommy Pham wrote:
> Hi,
>
> I'm just wondering if anyone on this list using some type of
> UID/UUID/GUID in any of the DB? If so, what DBMS/RDBMS are you using
> and how many rows do you have for the table(s) using it? What data
> type are you using for that column?

nope never been able to find any significant advantage; and thus ended
up using http uri's in my own domain space(s) which are always
guaranteed to be unique as I'm issuing them. bonus is that they can be
dereferenced and server as both a universal (resource) identifier and a
locater.

ps: resource = anything that can be named.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 02:56:52 von Tommy Pham

On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham wrote:

> nope never been able to find any significant advantage; and thus ended
> up using http uri's in my own domain space(s) which are always
> guaranteed to be unique as I'm issuing them. bonus is that they can be
> dereferenced and server as both a universal (resource) identifier and a
> locater.
>
> ps: resource = anything that can be named.
>

Hi Nathan,

I'm interested in hearing your technique of generating your own uuid
if you don't mind sharing :). I'm building a project to test my idea
about search engine and testing of different RDBMSes. So naturally,
it (the app) would crawl the net and I'd have over a 1 billion rows.

Thanks,
Tommy

PS: Here are some info for those who haven't heard of UUID/GUID:

http://affy.blogspot.com/2003/03/why-use-uuid-values.html
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-no t-to-uuid/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 04:37:28 von Paul M Foster

On Tue, Mar 30, 2010 at 02:18:53PM -0700, Tommy Pham wrote:

> On Tue, Mar 30, 2010 at 2:04 PM, Paul M Foster wrote:
> > On Tue, Mar 30, 2010 at 01:36:27PM -0700, Tommy Pham wrote:
> >
> >> Hi,
> >>
> >> I'm just wondering if anyone on this list using some type of
> >> UID/UUID/GUID in any of the DB?  If so, what DBMS/RDBMS are you using
> >> and how many rows do you have for the table(s) using it?  What data
> >> type are you using for that column?
> >
> > If I understand you correctly, I use a single table, "users". Either
> > MySQL or PostgreSQL (depending on the application). There is one record
> > per user, and that record contains a serial/sequential ID, set by the
> > system, a user ID which is varchar(8), email address which is
> > varchar(255), username which is varchar(50) and a password which is
> > varchar(32) and stored encrypted.
> >
>
> Hi Paul,
>
> In the case of mysql, it would be UUID and the value would look like this:
>
> 22ea1df1-3c40-11df-ab7a-200cd91e08cf
>
> and the case of postgresql,
>
> A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
> {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
> a0eebc999c0b4ef8bb6d6bb9bd380a11
>
> which is 36 CHAR length (including dashes not braces) and is not quite
> like identity insert (autoincrement). You could store it as
> binary(16) - in mysql - but you'll need to implement UDFs to convert
> between binary & char. Is that what you're using? or Are you using an
> INT type?

Unless you have some compelling need to store a number like this, I
don't see the need to. What I store is what is called in PostgreSQL a
"serial" value. MySQL calls it "auto_increment". You store all the other
values as a row, and the DBMS adds in the "auto_increment"/"serial"
value for you. It's an integer, *usually* one larger than the last value
entered.

Paul


--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 05:07:01 von Tommy Pham

On Tue, Mar 30, 2010 at 7:37 PM, Paul M Foster wrote:
>
> Unless you have some compelling need to store a number like this, I
> don't see the need to. What I store is what is called in PostgreSQL a
> "serial" value. MySQL calls it "auto_increment". You store all the other
> values as a row, and the DBMS adds in the "auto_increment"/"serial"
> value for you. It's an integer, *usually* one larger than the last value
> entered.
>
> Paul
>

The (personal) project I'm about to start will run in several threads
(thinking how can I make this happen in PHP) where the inserts could
be multiple inserts at once. Identity (type) insert doesn't behave
nicely (locking up the DB) when there are multiple inserts at once.
Using UUID is one of the possible solution to resolving this problem
for me.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 05:38:16 von Paul M Foster

On Tue, Mar 30, 2010 at 08:07:01PM -0700, Tommy Pham wrote:

> On Tue, Mar 30, 2010 at 7:37 PM, Paul M Foster
> wrote:
> >
> > Unless you have some compelling need to store a number like this, I
> > don't see the need to. What I store is what is called in PostgreSQL a
> > "serial" value. MySQL calls it "auto_increment". You store all the other
> > values as a row, and the DBMS adds in the "auto_increment"/"serial"
> > value for you. It's an integer, *usually* one larger than the last value
> > entered.
> >
> > Paul
> >
>
> The (personal) project I'm about to start will run in several threads
> (thinking how can I make this happen in PHP) where the inserts could
> be multiple inserts at once. Identity (type) insert doesn't behave
> nicely (locking up the DB) when there are multiple inserts at once.
> Using UUID is one of the possible solution to resolving this problem
> for me.

I initially misunderstood your request.

But what DBMS are you using that locks up on multiple simultaneous
inserts? Any reasonable DBMS (besides SQLite) ought to queue inserts and
execute them without locking up.

I think I still don't understand what you're trying to do. It appears
you're trying to get around an insert-locking problem by using UUIDs.
But I'm not sure how a UUID will resolve your problem. As far as the
DBMS is concerned, and insert is an insert, regardless of what you're
storing. A UUID would just be another field to store; it could be any
datatype, as far as the DBMS is concerned. Now, if you actually mean
*updates*, that may be a different matter. But again, the DBMS should
queue them.

I will say this-- if you're looking to add a unique identifier to each
record (I presume the reason for the UUID), and you're going to need
more than 2 billion records (32-bit platform), or 9 quadrillion records
(64-bit platform), integers or serials won't do. In PostgreSQL, you can
specify arbitrary precision numbers up to 1000 digits of precision as:

uuid number(128)

which would be what you're looking for. But you'd have to generate that
value yourself, as only a PostgreSQL "serial" type (integer) will
automatically increment.

Hope that helps (but I doubt it).

Paul

--
Paul M. Foster

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 06:16:24 von Tommy Pham

On Tue, Mar 30, 2010 at 8:38 PM, Paul M Foster wr=
ote:
> On Tue, Mar 30, 2010 at 08:07:01PM -0700, Tommy Pham wrote:
>
>> On Tue, Mar 30, 2010 at 7:37 PM, Paul M Foster
>> wrote:
>> >
>> > Unless you have some compelling need to store a number like this, I
>> > don't see the need to. What I store is what is called in PostgreSQL a
>> > "serial" value. MySQL calls it "auto_increment". You store all the oth=
er
>> > values as a row, and the DBMS adds in the "auto_increment"/"serial"
>> > value for you. It's an integer, *usually* one larger than the last val=
ue
>> > entered.
>> >
>> > Paul
>> >
>>
>> The (personal) project I'm about to start will run in several threads
>> (thinking how can I make this happen in PHP) where the inserts could
>> be multiple inserts at once.  Identity (type) insert doesn't behave
>> nicely (locking up the DB) when there are multiple inserts at once.
>> Using UUID is one of the possible solution to resolving this problem
>> for me.
>
> I initially misunderstood your request.
>
> But what DBMS are you using that locks up on multiple simultaneous
> inserts? Any reasonable DBMS (besides SQLite) ought to queue inserts and
> execute them without locking up.
>

In the past, I've not implement anything close to what I'm about to
do. Initial test run of the app will be about 10 threads. Live run
expected to be excess of 100 threads (and DB connections). Thus, I've
yet to encounter the locking problems, but I've read many DB articles,
in the past few years, from different sources. Some of those state
that lockup will happen when multiple simultaneous inserts occur.

> I think I still don't understand what you're trying to do. It appears
> you're trying to get around an insert-locking problem by using UUIDs.
> But I'm not sure how a UUID will resolve your problem. As far as the
> DBMS is concerned, and insert is an insert, regardless of what you're
> storing. A UUID would just be another field to store; it could be any
> datatype, as far as the DBMS is concerned. Now, if you actually mean
> *updates*, that may be a different matter. But again, the DBMS should
> queue them.
>
> I will say this-- if you're looking to add a unique identifier to each
> record (I presume the reason for the UUID), and you're going to need
> more than 2 billion records (32-bit platform), or 9 quadrillion records
> (64-bit platform), integers or serials won't do. In PostgreSQL, you can
> specify arbitrary precision numbers up to 1000 digits of precision as:
>
> uuid number(128)
>
> which would be what you're looking for. But you'd have to generate that
> value yourself, as only a PostgreSQL "serial" type (integer) will
> automatically increment.
>
> Hope that helps (but I doubt it).
>
> Paul
>
PostgreSQL have several functions to generate it:
http://www.postgresql.org/docs/8.3/static/uuid-ossp.html
MySQL has UUID().
MSSQL has NEWID().
Don't remember top of my head what Oracle has.
Don't know about Firebird as I have zero experience with it.

The adequate # of rows (of URL) I need is at least 1 billion. Given
that if I fetch 1 URL / sec, it would take me at least 31 years :)).
If I can get it to run smoothly with over 100 threads, I'd have that
data in less 1/3 year :D

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 09:37:09 von Lester Caine

Tommy Pham wrote:
> Hi,
>
> I'm just wondering if anyone on this list using some type of
> UID/UUID/GUID in any of the DB? If so, what DBMS/RDBMS are you using
> and how many rows do you have for the table(s) using it? What data
> type are you using for that column?

Firebird is currently improving support for universal identifier, but it has
been a 'feature' for a long time. It becomes essential when one is running
multiple databases across multiple machines that need to be combined later, so
the 'id' is unique to the machine it was generated on.

Storage wise it is just a 128 bit number, so twice as big as a simple 64 bit
'generator' that would normally be used in a single database system. The
improvement is to use a raw number rather than a 16 character string.

Generating it can follow one of the standards ( I forget the number ;) )

Displaying the number in one of the 'preferred formats' is just a matter for the
client app, but one would probably never need to display it normally.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 11:08:13 von Tommy Pham

On Wed, Mar 31, 2010 at 12:37 AM, Lester Caine wrote:
> Tommy Pham wrote:
>>
>> Hi,
>>
>> I'm just wondering if anyone on this list using some type of
>> UID/UUID/GUID in any of the DB?  If so, what DBMS/RDBMS are you usi=
ng
>> and how many rows do you have for the table(s) using it?  What data
>> type are you using for that column?
>
> Firebird is currently improving support for universal identifier, but it =
has

Thanks for the info. I'll look into it.

> been a 'feature' for a long time. It becomes essential when one is runnin=
g
> multiple databases across multiple machines that need to be combined late=
r,
> so the 'id' is unique to the machine it was generated on.
>
> Storage wise it is just a 128 bit number, so twice as big as a simple 64 =
bit
> 'generator' that would normally be used in a single database system. The
> improvement is to use a raw number rather than a 16 character string.

Don't you mean raw number as 16 byte, which is what I intend to use
binary(16) on MySQL if I'm going to use MySQL as DB, rather than 32/36
character string?

>
> Generating it can follow one of the standards ( I forget the number ;) )
>
> Displaying the number in one of the 'preferred formats' is just a matter =
for
> the client app, but one would probably never need to display it normally.
>

Agreed. Displaying that character string is ugly :)

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 11:29:51 von Lester Caine

Tommy Pham wrote:
>> Storage wise it is just a 128 bit number, so twice as big as a simple 64 bit
>> 'generator' that would normally be used in a single database system. The
>> improvement is to use a raw number rather than a 16 character string.

> Don't you mean raw number as 16 byte, which is what I intend to use
> binary(16) on MySQL if I'm going to use MySQL as DB, rather than 32/36
> character string?

Character is a byte ;)
Firebird we can use character string as 16 bytes. The 'improvement' is to
provide a proper data type for it, and then return a 'UUID' rather than '16
character string'.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: using UID in DB

am 31.03.2010 11:43:30 von Tommy Pham

On Wed, Mar 31, 2010 at 2:29 AM, Lester Caine wrote:
> Tommy Pham wrote:
>>>
>>> Storage wise it is just a 128 bit number, so twice as big as a simple 64
>>> bit
>>> 'generator' that would normally be used in a single database system. The
>>> improvement is to use a raw number rather than a 16 character string.
>
>> Don't you mean raw number as 16 byte, which is what I intend to use
>> binary(16) on MySQL if I'm going to use MySQL as DB, rather than 32/36
>> character string?
>
> Character is a byte ;)
> Firebird we can use character string as 16 bytes. The 'improvement' is to
> provide a proper data type for it, and then return a 'UUID' rather than '16
> character string'.
>

You mean something akin to MSSQL's uniqueidentifier data type - store
as 16 byte but will return the hex string on SELECT? AWESOME!! You
just gave me another reason not to use MySQL :D

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 11:57:11 von Nathan Rixham

Tommy Pham wrote:
> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham wrote:
>
>> nope never been able to find any significant advantage; and thus ended
>> up using http uri's in my own domain space(s) which are always
>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>> dereferenced and server as both a universal (resource) identifier and a
>> locater.
>>
>> ps: resource = anything that can be named.
>>
>
> Hi Nathan,
>
> I'm interested in hearing your technique of generating your own uuid
> if you don't mind sharing :). I'm building a project to test my idea
> about search engine and testing of different RDBMSes. So naturally,
> it (the app) would crawl the net and I'd have over a 1 billion rows.
>
> Thanks,
> Tommy

Hi Tommy,

Always good to see somebody experimenting and questioning things :)

With regards generating UUID's which are http schema uri's; this is
something I got hooked up about early on, but then with practise
realised much of the worlds data already has globally known and used
http scheme identifiers; for instance if I'm talking about a web page
then it's the URL for it; a user may as well be http://twitter.com/webr3
a country could be http://dbpedia.org/resource/United_Kingdom in the
rare occurrence where i actually need to create an identifier then
anything from a freebase style GUID (http://example.org/GUID-HERE)
through to a generated meaningful URI http://mydomain.com/user/username
/project/project-name or even just strapped to a class + microtime:
$uri = 'http://mydomain.com/__CLASS__/' . microtime(true);

There are milions of approaches; but it's worth noting that with each
you can have extra functionality due to the identifier and locator
duality of http scheme uri's (thanks to the domain name system).

With regards what you are doing, if I may suggest a few things that you
could try:

You can create Identifiers that are spatial POINT()s and store them in
mysql/postgres using either the MySQL spatial extension or PostGIS
respectively. You can create identifiers using something like POINT(
timestamp, float-id ) which again serves a duality of timestamping each
record and identifying it. Moreover you'll be shocked at the speed gains
from spatial indexing (seriously amazing), and further it allows you to
do some pretty cool functionality with amazing speed.

The spatial indexing lets you leverage your information in some pretty
cool ways, at phenomenal speed. Because your data is essentially now
points in a virtual world where X is time and Y is identity, you can
pull information out by drawing MBRs around the data and thus selecting
say all records between timestampA and timestampB with identities in the
range 0-1832.1234 (we use floats rather than ints, far more scope and
lends to great spatial optimisation / boxing). Further you aren't
limited to basic geometries; you can create chains of data using
linestring, test intersections on time, disjunctions and much more;
again all with shocking speed over even the biggest of data sets (many
billions in under 0.001s).

You may also want to test out some non relational databases; as
typically with large datasets you have to remove all the relational
parts of the database (foreign keys etc) just to be able to run the
thing efficiently. There are many kv db's; nosql solutions and my
personal favourites which are quad/triple stores for EAV modeled data.

Taking an datachanging approach and working with + storing all data as
EAV triples is by far the fastest and most efficient way to make both
small and large sites; everything is stored in a single flat "table" and
you can query across all your data with great speed and chain queries
together linking up id's to access your data in ways you can't even
imagine ;) personally I'm running triple stores with 3-4 billion rows on
many machines, even on my desktop!

I'll leave it there, but something to get you started..

Regards!

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 13:01:25 von Tommy Pham

On Wed, Mar 31, 2010 at 2:57 AM, Nathan Rixham wrote:
> Tommy Pham wrote:
>> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham wrote=
:
>>
>>> nope never been able to find any significant advantage; and thus ended
>>> up using http uri's in my own domain space(s) which are always
>>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>>> dereferenced and server as both a universal (resource) identifier and a
>>> locater.
>>>
>>> ps: resource =3D anything that can be named.
>>>
>>
>> Hi Nathan,
>>
>> I'm interested in hearing your technique of generating your own uuid
>> if you don't mind sharing :).  I'm building a project to test my id=
ea
>> about search engine and testing of different RDBMSes.  So naturally=
,
>> it (the app) would crawl the net and I'd have over a 1 billion rows.
>>
>> Thanks,
>> Tommy
>
> Hi Tommy,
>
> Always good to see somebody experimenting and questioning things :)
>
> With regards generating UUID's which are http schema uri's; this is
> something I got hooked up about early on, but then with practise
> realised much of the worlds data already has globally known and used
> http scheme identifiers; for instance if I'm talking about a web page
> then it's the URL for it; a user may as well be http://twitter.com/webr3
> a country could be http://dbpedia.org/resource/United_Kingdom in the
> rare occurrence where i actually need to create an identifier then
> anything from a freebase style GUID (http://example.org/GUID-HERE)
> through to a generated meaningful URI http://mydomain.com/user/username
> /project/project-name or even just strapped to a class + microtime:
> $uri =3D 'http://mydomain.com/__CLASS__/' . microtime(true);
>
> There are milions of approaches; but it's worth noting that with each
> you can have extra functionality due to the identifier and locator
> duality of http scheme uri's (thanks to the domain name system).
>

Very interesting approach. I'll have to think and research more into it.

> With regards what you are doing, if I may suggest a few things that you
> could try:
>
> You can create Identifiers that are spatial POINT()s and store them in
> mysql/postgres using either the MySQL spatial extension or PostGIS
> respectively. You can create identifiers using something like POINT(
> timestamp, float-id ) which again serves a duality of timestamping each
> record and identifying it. Moreover you'll be shocked at the speed gains
> from spatial indexing (seriously amazing), and further it allows you to
> do some pretty cool functionality with amazing speed.
>
> The spatial indexing lets you leverage your information in some pretty
> cool ways, at phenomenal speed. Because your data is essentially now
> points in a virtual world where X is time and Y is identity, you can
> pull information out by drawing MBRs around the data and thus selecting
> say all records between timestampA and timestampB with identities in the
> range 0-1832.1234 (we use floats rather than ints, far more scope and
> lends to great spatial optimisation / boxing). Further you aren't
> limited to basic geometries; you can create chains of data using
> linestring, test intersections on time, disjunctions and much more;
> again all with shocking speed over even the biggest of data sets (many
> billions in under 0.001s).
>
> You may also want to test out some non relational databases; as
> typically with large datasets you have to remove all the relational
> parts of the database (foreign keys etc) just to be able to run the
> thing efficiently. There are many kv db's; nosql solutions and my
> personal favourites which are quad/triple stores for EAV modeled data.
>
> Taking an datachanging approach and working with + storing all data as
> EAV triples is by far the fastest and most efficient way to make both
> small and large sites; everything is stored in a single flat "table" and
> you can query across all your data with great speed and chain queries
> together linking up id's to access your data in ways you can't even
> imagine ;) personally I'm running triple stores with 3-4 billion rows on
> many machines, even on my desktop!
>
> I'll leave it there, but something to get you started..
>
> Regards!
>

As for spatial data types, I've never find much use for non scientific
related. (example) If using point as a PK, if MySQL stores it the
same way as PostgreSQL which is 16 bytes, how is that any different -
performance wise - than using UUID and storing it as binary(16) for
MySQL or uniqueidentifier (16 bytes) for PostgreSQL?

Thanks,
Tommy

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 13:06:52 von Nathan Rixham

Tommy Pham wrote:
>
> As for spatial data types, I've never find much use for non scientific
> related. (example) If using point as a PK, if MySQL stores it the
> same way as PostgreSQL which is 16 bytes, how is that any different -
> performance wise - than using UUID and storing it as binary(16) for
> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>

it's all about the indexing (R-Tree)

http://en.wikipedia.org/wiki/R-tree




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 19:23:26 von TedD

At 5:56 PM -0700 3/30/10, Tommy Pham wrote:
>On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham wrote:
>
>> nope never been able to find any significant advantage; and thus ended
>> up using http uri's in my own domain space(s) which are always
>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>> dereferenced and server as both a universal (resource) identifier and a
>> locater.
>>
>> ps: resource = anything that can be named.
>>
>
>Hi Nathan,
>
>I'm interested in hearing your technique of generating your own uuid
>if you don't mind sharing :). I'm building a project to test my idea
>about search engine and testing of different RDBMSes. So naturally,
>it (the app) would crawl the net and I'd have over a 1 billion rows.
>
>Thanks,
>Tommy
>
>PS: Here are some info for those who haven't heard of UUID/GUID:
>
>http://affy.blogspot.com/2003/03/why-use-uuid-values.html
>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-n ot-to-uuid/
>

I've read your links and see the problem presented.

The solution is to create an absolutely unique user ID and therein
lies the problem. How do you create something that is absolutely
unique?

Clearly, if you have one database creating records, an auto_increment
will work for creating an unique number for that's what increment
does. However, if you have more than one database creating records
at the same time, then conflicts would occur.

I had a similar problem recently where I used two fields to identify
a record as being "unique". One field used the exact time the record
was added to the database and the other field was a random number.
Combining the two numbers I believed I had a unique number for the
taks I was doing. Of course, if activity was spread across hundreds
of servers with millions of entries per second, then my method would
not be a solution. As such, the solution should be tailored to the
problem.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 19:33:28 von Tommy Pham

On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham wrote:
> Tommy Pham wrote:
>>
>> As for spatial data types, I've never find much use for non scientific
>> related.  (example) If using point as a PK, if MySQL stores it the
>> same way as PostgreSQL which is 16 bytes, how is that any different -
>> performance wise - than using UUID and storing it as binary(16) for
>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>
>
> it's all about the indexing (R-Tree)
>
> http://en.wikipedia.org/wiki/R-tree
>

I can see where the performance would be between B-Tree vs R-Tree for
the same field size but I've yet to see real life application of it.
Case in point, if using point for GPS data coordinates, then wouldn't
it still be a lot better to use Lon (float), Lat (float) which is 2
fields of 4 bytes each vs 1 field of 16 bytes? The index would still
be faster (8 bytes less) in B-Tree right? Not to mention smaller row
& DB size. As for calculating the distance between 2 'points', it
would be simpler for querying purposes to use 'point'. ATM, I don't
have need to do any such calculations. If I do use 'point' for PK,
I'd run into problems with scaling and migration later.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 20:01:56 von Nathan Rixham

Tommy Pham wrote:
> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham wrote:
>> Tommy Pham wrote:
>>> As for spatial data types, I've never find much use for non scientific
>>> related. (example) If using point as a PK, if MySQL stores it the
>>> same way as PostgreSQL which is 16 bytes, how is that any different -
>>> performance wise - than using UUID and storing it as binary(16) for
>>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>>
>> it's all about the indexing (R-Tree)
>>
>> http://en.wikipedia.org/wiki/R-tree
>>
>
> I can see where the performance would be between B-Tree vs R-Tree for
> the same field size but I've yet to see real life application of it.
> Case in point, if using point for GPS data coordinates, then wouldn't
> it still be a lot better to use Lon (float), Lat (float) which is 2
> fields of 4 bytes each vs 1 field of 16 bytes? The index would still
> be faster (8 bytes less) in B-Tree right? Not to mention smaller row
> & DB size.

wish I still had an application to point you at; and I can't assert in
any other way just how much faster it is; especially over large datasets
- regardless of the amount of rows the style of index doesn't slow
(spatial) queries down.

if index byte size is of importance then negate spatial indexes (which
are rather a lot bigger).

regardless though, it was just an idea I was throwing at you, not
suggesting it's the best approach, but worth consideration depending on
your priorities.

As for calculating the distance between 2 'points', it
> would be simpler for querying purposes to use 'point'. ATM, I don't
> have need to do any such calculations. If I do use 'point' for PK,
> I'd run into problems with scaling and migration later.

the points don't matter tbh; or should i say specifying an x and a y
doesn't matter, because you can simply get by a single id, or get a
range of ids very quickly; the other value comes in to play when you
want temporal queries.

again though, i reiterate only useful if you want speed and don't care
about index bytesize - and certainly not forcing it down your neck.

re scaling.. unsure theoretically you have 15 digit precision which is a
mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
incomprehensible number of rows are possible before a collision.

all in all: there are many good reasons why I've only used this on
occassion (specifically when dealing with 1 million+ rows in an RDBMS
table); and many more good reasons why i stick to non-rdbms databases
and use http uri's + eav model data only nowadays.

The latter i would advocate, the former not so unless you are stuck w/
mysql postgres - in which case you can't get faster. [1]

[1] get some numbers to prove when i have time.

Nathan

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 20:08:37 von Tommy Pham

On Wed, Mar 31, 2010 at 10:23 AM, tedd wrote:
> At 5:56 PM -0700 3/30/10, Tommy Pham wrote:
>>
>> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham wrote=
:
>>
>>>  nope never been able to find any significant advantage; and thus =
ended
>>>  up using http uri's in my own domain space(s) which are always
>>>  guaranteed to be unique as I'm issuing them. bonus is that they c=
an be
>>>  dereferenced and server as both a universal (resource) identifier=
and a
>>>  locater.
>>>
>>>  ps: resource =3D anything that can be named.
>>>
>>
>> Hi Nathan,
>>
>> I'm interested in hearing your technique of generating your own uuid
>> if you don't mind sharing :).  I'm building a project to test my id=
ea
>> about search engine and testing of different RDBMSes.  So naturally=
,
>> it (the app) would crawl the net and I'd have over a 1 billion rows.
>>
>> Thanks,
>> Tommy
>>
>> PS: Here are some info for those who haven't heard of UUID/GUID:
>>
>> http://affy.blogspot.com/2003/03/why-use-uuid-values.html
>> http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-no t-to-uuid/
>>
>
> I've read your links and see the problem presented.
>
> The solution is to create an absolutely unique user ID and therein lies t=
he
> problem. How do you create something that is absolutely unique?
>
> Clearly, if you have one database creating records, an auto_increment wil=
l
> work for creating an unique number for that's what increment does.  =
However,
> if you have more than one database creating records at the same time, the=
n
> conflicts would occur.
>
> I had a similar problem recently where I used two fields to identify a
> record as being "unique". One field used the exact time the record was ad=
ded
> to the database and the other field was a random number. Combining the tw=
o
> numbers I believed I had a unique number for the taks I was doing. Of
> course, if activity was spread across hundreds of servers with millions o=
f
> entries per second, then my method would not be a solution. As such, the
> solution should be tailored to the problem.
>
> Cheers,
>
> tedd
>

The original implementation of UUID/GUID includes MAC of the NIC which
will guarantee that uniqueness. But because of certain privacy issues
brought up, they have to come up with other methods, though slightly
less unique. I've not heard of someone running into collision using
UUID/GUID.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 20:16:02 von TedD

At 11:08 AM -0700 3/31/10, Tommy Pham wrote:
>On Wed, Mar 31, 2010 at 10:23 AM, tedd wrote:
>> At 5:56 PM -0700 3/30/10, Tommy Pham wrote:
>>>
>>> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham wrote:
>>>
>>>> nope never been able to find any significant advantage; and thus ended
>>>> up using http uri's in my own domain space(s) which are always
>>>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>>>> dereferenced and server as both a universal (resource) identifier and a
>>>> locater.
>>>>
>>>> ps: resource = anything that can be named.
>>>>
>>>
>>> Hi Nathan,
>>>
>>> I'm interested in hearing your technique of generating your own uuid
>>> if you don't mind sharing :). I'm building a project to test my idea
>>> about search engine and testing of different RDBMSes. So naturally,
>>> it (the app) would crawl the net and I'd have over a 1 billion rows.
>>>
>>> Thanks,
>>> Tommy
>>>
>>> PS: Here are some info for those who haven't heard of UUID/GUID:
>>>
>>> http://affy.blogspot.com/2003/03/why-use-uuid-values.html
>>> http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-no t-to-uuid/
>>>
>>
>> I've read your links and see the problem presented.
>>
>> The solution is to create an absolutely unique user ID and therein lies the
>> problem. How do you create something that is absolutely unique?
>>
>> Clearly, if you have one database creating records, an auto_increment will
>> work for creating an unique number for that's what increment does. However,
>> if you have more than one database creating records at the same time, then
>> conflicts would occur.
>>
>> I had a similar problem recently where I used two fields to identify a
>> record as being "unique". One field used the exact time the record was added
>> to the database and the other field was a random number. Combining the two
>> numbers I believed I had a unique number for the taks I was doing. Of
>> course, if activity was spread across hundreds of servers with millions of
>> entries per second, then my method would not be a solution. As such, the
>> solution should be tailored to the problem.
>>
>> Cheers,
>>
>> tedd
>>
>
>The original implementation of UUID/GUID includes MAC of the NIC which
>will guarantee that uniqueness. But because of certain privacy issues
>brought up, they have to come up with other methods, though slightly
>less unique. I've not heard of someone running into collision using
>UUID/GUID.

I didn't say otherwise.

At some point, you asked how others solve the unique problem and I
provided my solution.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 20:24:46 von Tommy Pham

On Wed, Mar 31, 2010 at 11:01 AM, Nathan Rixham wrote:
> Tommy Pham wrote:
>> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham wrote=
:
>>> Tommy Pham wrote:
>>>> As for spatial data types, I've never find much use for non scientific
>>>> related.  (example) If using point as a PK, if MySQL stores it th=
e
>>>> same way as PostgreSQL which is 16 bytes, how is that any different -
>>>> performance wise - than using UUID and storing it as binary(16) for
>>>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>>>
>>> it's all about the indexing (R-Tree)
>>>
>>> http://en.wikipedia.org/wiki/R-tree
>>>
>>
>> I can see where the performance would be between B-Tree vs R-Tree for
>> the same field size but I've yet to see real life application of it.
>> Case in point, if using point for GPS data coordinates, then wouldn't
>> it still be a lot better to use Lon (float), Lat (float) which is 2
>> fields of 4 bytes each vs 1 field of 16 bytes?  The index would sti=
ll
>> be faster (8 bytes less) in B-Tree right?  Not to mention smaller r=
ow
>> & DB size.
>
> wish I still had an application to point you at; and I can't assert in
> any other way just how much faster it is; especially over large datasets
> - regardless of the amount of rows the style of index doesn't slow
> (spatial) queries down.

If I see the PoC w/o having to see the actual setup, then I would
probably understand your point better.

>
> if index byte size is of importance then negate spatial indexes (which
> are rather a lot bigger).
>

Since the spatial indexes are bigger, wouldn't that work against it
due disk access having to read more?

> regardless though, it was just an idea I was throwing at you, not
> suggesting it's the best approach, but worth consideration depending on
> your priorities.
>

Thanks for your idea. I'd appreciate it very much. I'm told that I
over analyze things at times ... lol ... so please don't take it the
wrong way :)

>  As for calculating the distance between 2 'points', it
>> would be simpler for querying purposes to use 'point'.  ATM, I don'=
t
>> have need to do any such calculations.  If I do use 'point' for PK,
>> I'd run into problems with scaling and migration later.
>
> the points don't matter tbh; or should i say specifying an x and a y
> doesn't matter, because you can simply get by a single id, or get a
> range of ids very quickly; the other value comes in to play when you
> want temporal queries.
>
> again though, i reiterate only useful if you want speed and don't care
> about index bytesize - and certainly not forcing it down your neck.
>
> re scaling.. unsure theoretically you have 15 digit precision which is a
> mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
> incomprehensible number of rows are possible before a collision.
>
> all in all: there are many good reasons why I've only used this on
> occassion (specifically when dealing with 1 million+ rows in an RDBMS
> table); and many more good reasons why i stick to non-rdbms databases
> and use http uri's + eav model data only nowadays.
>
> The latter i would advocate, the former not so unless you are stuck w/
> mysql postgres - in which case you can't get faster. [1]
>
> [1] get some numbers to prove when i have time.
>
> Nathan
>

Using HTTP URI's + EAV model is something I'll definitely look into.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 20:26:17 von Tommy Pham

On Wed, Mar 31, 2010 at 11:16 AM, tedd wrote:
>
> I didn't say otherwise.
>
> At some point, you asked how others solve the unique problem and I provided
> my solution.
>
> Cheers,
>
> tedd
>

Sorry, I misread that.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: using UID in DB

am 31.03.2010 20:41:04 von Nathan Rixham

Tommy Pham wrote:
> On Wed, Mar 31, 2010 at 11:01 AM, Nathan Rixham wrote:
>> Tommy Pham wrote:
>>> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham wrote:
>>>> Tommy Pham wrote:
>>>>> As for spatial data types, I've never find much use for non scientific
>>>>> related. (example) If using point as a PK, if MySQL stores it the
>>>>> same way as PostgreSQL which is 16 bytes, how is that any different -
>>>>> performance wise - than using UUID and storing it as binary(16) for
>>>>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>>>>
>>>> it's all about the indexing (R-Tree)
>>>>
>>>> http://en.wikipedia.org/wiki/R-tree
>>>>
>>> I can see where the performance would be between B-Tree vs R-Tree for
>>> the same field size but I've yet to see real life application of it.
>>> Case in point, if using point for GPS data coordinates, then wouldn't
>>> it still be a lot better to use Lon (float), Lat (float) which is 2
>>> fields of 4 bytes each vs 1 field of 16 bytes? The index would still
>>> be faster (8 bytes less) in B-Tree right? Not to mention smaller row
>>> & DB size.
>> wish I still had an application to point you at; and I can't assert in
>> any other way just how much faster it is; especially over large datasets
>> - regardless of the amount of rows the style of index doesn't slow
>> (spatial) queries down.
>
> If I see the PoC w/o having to see the actual setup, then I would
> probably understand your point better.
>
>> if index byte size is of importance then negate spatial indexes (which
>> are rather a lot bigger).
>>
>
> Since the spatial indexes are bigger, wouldn't that work against it
> due disk access having to read more?

depends on how seriously you take a solution tbh; if you just wanna run
a huge dataset on a sata drive w/ 2gb ram then your screwed whatever
approach (unless you turn the dataset in to many fine grained resources,
heavily cached and only accessed on a need to have basis) but even then
there are many limitations.

one approach is to throw more ram at a situation; where you are
definitely better loading a huge dataset in to one box with massive
amounts of ram than splitting it over multiple boxes. [1]

http://rickonrails.wordpress.com/2009/03/30/big-ole-mysql-sp atial-table-optimization-tricks/

if you're really serious then you want to be on a box with a tonne of
ram and pci express solid state storage devices which give you
throughput of 700MB/s and higher; prices aren't that bad comparatively
and one machine w/ a $800 card added works much better than 2-3 servers
(although you still need to cover replication and high availability).

http://www.fusionio.com/ioxtreme/

wandering off course a bit maybe; but these are situations we need to
consider.

I don't know how big you think big is; but realistically aws ec2
instances coupled with various "big data" / nosql - non rdbms approaches
are the way to go (see many many many recent discussions on the subject
around the net)

>> regardless though, it was just an idea I was throwing at you, not
>> suggesting it's the best approach, but worth consideration depending on
>> your priorities.
>>
>
> Thanks for your idea. I'd appreciate it very much. I'm told that I
> over analyze things at times ... lol ... so please don't take it the
> wrong way :)

ty for clarifying, and deal - I'm the same :)

>> As for calculating the distance between 2 'points', it
>>> would be simpler for querying purposes to use 'point'. ATM, I don't
>>> have need to do any such calculations. If I do use 'point' for PK,
>>> I'd run into problems with scaling and migration later.
>> the points don't matter tbh; or should i say specifying an x and a y
>> doesn't matter, because you can simply get by a single id, or get a
>> range of ids very quickly; the other value comes in to play when you
>> want temporal queries.
>>
>> again though, i reiterate only useful if you want speed and don't care
>> about index bytesize - and certainly not forcing it down your neck.
>>
>> re scaling.. unsure theoretically you have 15 digit precision which is a
>> mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
>> incomprehensible number of rows are possible before a collision.
>>
>> all in all: there are many good reasons why I've only used this on
>> occassion (specifically when dealing with 1 million+ rows in an RDBMS
>> table); and many more good reasons why i stick to non-rdbms databases
>> and use http uri's + eav model data only nowadays.
>>
>> The latter i would advocate, the former not so unless you are stuck w/
>> mysql postgres - in which case you can't get faster. [1]
>>
>> [1] get some numbers to prove when i have time.
>>
>> Nathan
>>
>
> Using HTTP URI's + EAV model is something I'll definitely look into.
>

good good; i can assure you it's the future; all the major corps and
governments are moving to this (age-old) model under the banner of
linked data, odata, and partially gdata - and meanwhile the development
community is slowly getting there with focus moving to kv databases
which are just one step away from the full on eav / triple approach.

Many regards,

Nathan

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php