I have a database that has types in them with unique names. These types
are referenced from other tables through a surrogate integer key. I'm
now wondering if I should eliminate that surrogate key and just use the
name as the primary key. Afaiu, surrogate keys are primarily there to
make joining tables or otherwise searching for a record faster, because
it's faster to compare two integers than it is to compare two strings.
Now when I want to search for a type in types or another table that
references types(type_id), under what circumstances is it advisable to
have a surrogate integer key and not use the unique type name? Is
searching for an integer as fast as is searching for a string when both
have an index? How many records in the type table do I need to make a
surrogate key a not unsignificantly faster way to retrieve a row? What
about joins? Are these the right questions?
Thanks.
--
Markus Bertheau
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: surrogate key or not?
am 21.07.2004 03:31:03 von Iain
Hi,
for my 2c worth, performance is the least important of the things you need
to consider regarding use of surrogate keys.
I use surrogate keys for all situations except the simplest code/description
tables, and this is only when the code has no meaning to the application. If
there is any possibility that you will want to update or re-use codes
(attaching a different meaning to them) then surrogate keys are the way to
go..
Thus I see it more as an issue of business logic than performance. There are
of course many other considerations with relational theory and stuff like
that which you could debate endlessly. I expect that googling on "surrogate
keys" would yeild interesting results.
Regards
Iain
----- Original Message -----
From: "Markus Bertheau"
To:
Sent: Tuesday, July 20, 2004 9:16 PM
Subject: [SQL] surrogate key or not?
> Hi,
>
> I have a database that has types in them with unique names. These types
> are referenced from other tables through a surrogate integer key. I'm
> now wondering if I should eliminate that surrogate key and just use the
> name as the primary key. Afaiu, surrogate keys are primarily there to
> make joining tables or otherwise searching for a record faster, because
> it's faster to compare two integers than it is to compare two strings.
>
> Now when I want to search for a type in types or another table that
> references types(type_id), under what circumstances is it advisable to
> have a surrogate integer key and not use the unique type name? Is
> searching for an integer as fast as is searching for a string when both
> have an index? How many records in the type table do I need to make a
> surrogate key a not unsignificantly faster way to retrieve a row? What
> about joins? Are these the right questions?
>
> Thanks.
>
> --
> Markus Bertheau
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 21.07.2004 09:00:59 von Josh
Markus, Iain,
> Thus I see it more as an issue of business logic than performance. There are
> of course many other considerations with relational theory and stuff like
> that which you could debate endlessly. I expect that googling on "surrogate
> keys" would yeild interesting results.
Frankly, I couldn't disagree more. This is why it was so problematic for the
SQL committee to enshrine "primary keys" and sequences in the standard; it
mis-educates database designers into believing that surrogate keys are
somehow part of the data model. They are most decidely NOT.
Given: Surrogate keys, by definition, represent no real data;
Given: Only items which represent real data have any place in
a data model
Conclusion: Surrogate keys have no place in the data model
There are, in fact, three very good reasons to use surrogate keys, all of
which are strictly due to limitations of technology; that is, implementation
and performance issues, NOT business logic. They are:
1) Convenience: It's very annoying to have to refer to a 4-column foriegn key
whenever you do a join in queries or want to delete a record, as well as
tracking a 4-element composite in your client software.
2) Performance: INT and BIGINT data types are among the most compact and
efficient stored in most RDBMSs. So using anything else as a key would
likely result in a loss of performance on large-table joins.
3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and
updates. Some RDBMSs do not support CASCADE, forcing the client software to
fix all the dependant rows. This means that DBAs are very reluctant to use
columns which change frequently as join keys.
All three of these implementation issues are, at least in theory,
surmountable. For example, Sybase overcame problems (1) and (3) by creating
an automated, system-controlled hash key based on the table's real key. This
was a solution endorsed by E.F. Codd in the mid-90's when he came to regret
his promotion of the "Primary Key" idea in the SQL standard.
Now, you're probably wondering "why does this guy regard surrogate keys as a
problem?" I'll tell you: I absolutely cannot count the number of "bad
databases" I've encountered which contained tables with a surrogate key, and
NO REAL KEY of any kind. This makes data normalization impossible, and
cleanup of the database becomes a labor-intensive process requiring
hand-examination of each row.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: surrogate key or not?
am 21.07.2004 09:30:17 von Josh
Markus,
Oh, so you want USEFUL answers. OK.
> Now when I want to search for a type in types or another table that
> references types(type_id), under what circumstances is it advisable to
> have a surrogate integer key and not use the unique type name?
When using the actual name will be a performance problem.
> Is
> searching for an integer as fast as is searching for a string when both
> have an index?
Not usually, no. The index on the text values will simply be larger than the
one on 4-byte INTs, which means it's "slower", assuming you run out of memory
some of the time. If your whole DB fits in RAM, it's not worth worrying
about.
> How many records in the type table do I need to make a
> surrogate key a not unsignificantly faster way to retrieve a row?
It needs to be large enougth that the difference in data types makes a
difference in whether or not it will fit into sort_mem, and how likely it is
to be already cached in memory.
> What
> about joins?
Double jeopardy; you're using the column twice so double the storage
difference. Otherwise, it's just the same issue; does it still fit in RAM
or not?
> Are these the right questions?
Also you'll want to consider the speed of CASCADE operations whenever a
type_name changes. If these changes occur extremely infrequently, then you
can ignore this as well.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: surrogate key or not?
am 21.07.2004 09:44:00 von Sad
Thnx, Josh
you are very helpful.
> There are, in fact, three very good reasons to use surrogate keys, all of
> which are strictly due to limitations of technology; that is,
> implementation and performance issues, NOT business logic. They are:
>
> 1) Convenience: It's very annoying to have to refer to a 4-column foriegn
> key whenever you do a join in queries or want to delete a record, as well
> as tracking a 4-element composite in your client software.
>
> 2) Performance: INT and BIGINT data types are among the most compact and
> efficient stored in most RDBMSs. So using anything else as a key would
> likely result in a loss of performance on large-table joins.
>
> 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and
> updates. Some RDBMSs do not support CASCADE, forcing the client software
> to fix all the dependant rows. This means that DBAs are very reluctant to
> use columns which change frequently as join keys.
> Now, you're probably wondering "why does this guy regard surrogate keys as
> a problem?" I'll tell you: I absolutely cannot count the number of "bad
> databases" I've encountered which contained tables with a surrogate key,
> and NO REAL KEY of any kind. This makes data normalization impossible,
> and cleanup of the database becomes a labor-intensive process requiring
> hand-examination of each row.
The surrogate keys is a real big problem.
But i want to add another two reasons to have a surrogate keys
4) Replication: to identify an object of ANY type (record of any table
regardless to datamodel), to store lists of deleted or modified objects
(regardless to datamodel)
5) Making a primary key: if there is no real key at all.
the sentence (5) is debatable. in theory every relation has a real key, but in
the practice we have historical datamodels without PK !!! it is impossible
but it exists.
For example:
Here in Russia we have a lot of different but identically named streets within
one city. They has absoluetly identical attributes. Historically only human
not machines work on that datamodel and they all used to call such streets by
name adding some spechial non-formal explainations, for example:
"deliver this message please to the house 35 on the Green street, that is to
the west of the center of the city."
"deliver this message please to the house 12 on the Green street, that is
shortest of the all Green streets in the town."
Another exaple is a table of user messages:
user has a number of messages they have no mean attribute and also could be
identical, so you force this user to numbering his messages, or number them
yourself.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Re: surrogate key or not?
am 21.07.2004 12:30:16 von Sad
On Wednesday 21 July 2004 14:58, Markus Bertheau wrote:
> > 4) Replication: to identify an object of ANY type (record of any table
> > regardless to datamodel), to store lists of deleted or modified objects
> > (regardless to datamodel)
>
> That sounds like a datamodel on data that belongs to another data model,
that's right, these surr. keys actually belong to a metadata model,
but it is also useful to references of a data model. and they are used this=
=20
way.
> > 5) Making a primary key: if there is no real key at all.
>
> When there's no key at all, there can't be a surrogate key, as I
> understand it. In such cases a generated unique number comes in handy,
> and it's a real primary key and no surrogate key.
....right. in case (5) we construct another datamodel indeed. with an=20
artifichial real key :-)
P.S. r u from Russia ?
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: surrogate key or not?
am 21.07.2004 12:58:35 von twanger
РСÑд, 21.07.2004, в 09:44, sad пиÑеÑ:
> 4) Replication: to identify an object of ANY type (record of any table
> regardless to datamodel), to store lists of deleted or modified objects
> (regardless to datamodel)
That sounds like a datamodel on data that belongs to another data model,
and 5) applies.
> 5) Making a primary key: if there is no real key at all.
When there's no key at all, there can't be a surrogate key, as I
understand it. In such cases a generated unique number comes in handy,
and it's a real primary key and no surrogate key.
СпаÑибо
--
Markus Bertheau
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 21.07.2004 17:10:51 von pg
> All three of these implementation issues are, at least in theory,
> surmountable. For example, Sybase overcame problems (1) and (3) by creating
> an automated, system-controlled hash key based on the table's real key. This
> was a solution endorsed by E.F. Codd in the mid-90's when he came to regret
> his promotion of the "Primary Key" idea in the SQL standard.
I've often wondered if the OID was intended to be something similar.
abc(oid oid, key varchar(120) PRIMARY KEY);
bcd(key oid REFERENCES abc(oid));
Then muck around with the queries to resolve the actual varchar() name
when required.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: surrogate key or not?
am 21.07.2004 19:25:33 von Josh
Sad,
First of all, please excuse me if I've misunderstood you below because of
translation issues. You'll find I'm rather strident, but it's because the
reasons you're presenting, or seem to be, are excuses for bad database design
I hear every day on the job, and end up having to fix when they go wrong.
Now, to reply to your comments:
> 4) Replication: to identify an object of ANY type (record of any table
> regardless to datamodel), to store lists of deleted or modified objects
> (regardless to datamodel)
Only if the Replication product requires it. Note that our new replication
engine, Slony-I, does *not* require surrogate keys; it can handle
multi-column primary keys. I'm not sure about the other replication
products, they may be able to as well.
I think what you're talking about is a GUID (Global Unique Identifier). Once
again, while most GUIDs are based on random number theory, the ideal GUID
would be a unique combination of the Row Key, the version, and the server
identification -- making it "real data" instead of a random number that tells
you nothing about the row. For performance, this combination might be
combined in a hash.
In other words, I think your 4th point is actually part of Point (1),
convenience.
> 5) Making a primary key: if there is no real key at all.
No, no, no, a thousand times NO.
Data without a real key is NOT DATA. It is just garbage.
I manage or consult on more than twenty production databases for my clients.
Every single one of those databases has a real unique key on every single
table. The ONLY exception is "holding tables" for incoming bulk data, some
of which will be retained as data and some of which will discarded as garbage
-- and key uniqueness is the primary test of what is and isn't data.
> the sentence (5) is debatable. in theory every relation has a real key, but
> in the practice we have historical datamodels without PK !!! it is
> impossible but it exists.
Only because you have bad historical databases. This is not an excuse, it is
a reason to fix them.
> For example:
> Here in Russia we have a lot of different but identically named streets
> within one city. They has absoluetly identical attributes. Historically
> only human not machines work on that datamodel and they all used to call
> such streets by name adding some spechial non-formal explainations, for
> example:
> "deliver this message please to the house 35 on the Green street, that is
> to the west of the center of the city."
> "deliver this message please to the house 12 on the Green street, that is
> shortest of the all Green streets in the town."
This is a PERFECT example of the evil of surrogate keys. You had a data
problem, and instead of solving it, you chose to lean on surrogate keys like
a crutch. Here's what you have in your database now:
Streets
ID Street Name
345 Green Street
2019 Green Street
5781 Green Street
Key: ID???
How, exactly, do you expect to distinguish the above 3 "green streets" from
each other? How do you expect to your users to know that 345 is West Green
Street and 5781 is the shortest in town? How are they supposed to choose
between the three? How do you know there are actually 3 and maybe not just
two or even one? The ID 5781 isn't exactly informative; in fact, it's
meaningless. It's GARBAGE.
Look, real data, on a row-by-row basis, is a POSTULATE. It is should
describe adequately the portion of the world you are trying to represent.
Automated surrogate keys are NOT part of these postulates since they convey
no meaningful information. This is what you currently have:
There is a street named GREEN STREET.
There is a street named GREEN STREET.
There is a street named GREEN STREET.
As you can see by rendering it in English, those three statements are damned
confusing. Are there three streets named Green Street, or is there only
one? Either could be true. We don't know, the data doesn't tell us.
What you need is:
Streets
ID Street Name Location
345 Green Street West Side of City
2019 Green Street In Front of Consulate
5781 Green Street Shortest in Town
Key: Street Name, Location
This gives you much more meaningful data:
There is a street named GREEN STREET which is on the WEST SIDE OF THE CITY.
There is a street named GREEN STREET which is IN FRONT OF THE CONSULATE.
There is a street named GREEN STREET which is the SHORTEST IN TOWN.
This tells the user which Green Street he wants to select. It also gives us
a fair indication that there are 3 Green Streets, and not one or two. And if
there is duplication -- for example, if the street in front of the consulate
is also the shortest in town -- then we can determine this and correct it in
a minute or less by looking at a map.
I'm not just picking on you. I'm saying this because it's happened to me and
was a costly error for both me and the client.
Several years ago, we were doing a scheduling database, and the client messed
around with the data model a lot. In the process, we mistakenly eliminated
the Events table's (the largest and most important table in the database)
real unique key. But because we were "leaning" on a surrogate key, we
didn't notice until the database was in production.
Then, after 6 months of scheduling, the client began to suspect that duplicate
Events were creeping into the database. But without a real, unique key, we
discovered that we could not figure out whether or not two events with
similar information were duplicates or not! The Event_ID told us nothing.
This was a real, critical problem because each Event required the attendance
of an attorney and there were never enough attorneys to go around.
--
Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Re: surrogate key or not?
am 22.07.2004 04:42:49 von Iain
Hi Josh,
Considering that I generally agree with your comments (in this and your
later posts)
I'd say I didn't make myself clear in my brief comment.
Of the considerations: performance, convenience and business logic, I
personally rate performance as the lowest priority. The convenience part has
been well covered in other posts and is second most in my opinion..
By business logic I was thinking of a situation we had on an old db where
employees were idenitfied by codes. These codes were transmitted all over
the database (as you can imagine) which meant that the codes couldn't be
recycled as employees came and left. The answer was to introduce an integer
based auto-generated key. That way the data associated with an ex-employee
is associated with that employee, not with the code. It may be that this
really a convenience issue as opposed to business logic, but in my thinking
the code was data, not a
primary key - it is not even an alternate key (if ex-employees have their
code
nulled it can be made "unique" though). Perhaps the term surrogate key
doesn't apply in this case, I have to admit that I'm not 100% on terminalogy
here.
In the end, accurately representing the business logic is most importent,
and use of a contrived sequential key shouldnt preclude the use of unique
constraints where needed so the problem you described of duplicate events
should never have happened anyway.
This is a great topic though, I find it really interesting.
Regards
Iain
----- Original Message -----
From: "Josh Berkus"
To: "Iain" ; "Markus Bertheau" ;
Sent: Wednesday, July 21, 2004 4:00 PM
Subject: Re: [SQL] surrogate key or not?
> Markus, Iain,
>
> > Thus I see it more as an issue of business logic than performance. There
are
> > of course many other considerations with relational theory and stuff
like
> > that which you could debate endlessly. I expect that googling on
"surrogate
> > keys" would yeild interesting results.
>
> Frankly, I couldn't disagree more. This is why it was so problematic for
the
> SQL committee to enshrine "primary keys" and sequences in the standard; it
> mis-educates database designers into believing that surrogate keys are
> somehow part of the data model. They are most decidely NOT.
>
> Given: Surrogate keys, by definition, represent no real data;
> Given: Only items which represent real data have any place in
> a data model
> Conclusion: Surrogate keys have no place in the data model
>
> There are, in fact, three very good reasons to use surrogate keys, all of
> which are strictly due to limitations of technology; that is,
implementation
> and performance issues, NOT business logic. They are:
>
> 1) Convenience: It's very annoying to have to refer to a 4-column foriegn
key
> whenever you do a join in queries or want to delete a record, as well as
> tracking a 4-element composite in your client software.
>
> 2) Performance: INT and BIGINT data types are among the most compact and
> efficient stored in most RDBMSs. So using anything else as a key would
> likely result in a loss of performance on large-table joins.
>
> 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and
> updates. Some RDBMSs do not support CASCADE, forcing the client software
to
> fix all the dependant rows. This means that DBAs are very reluctant to
use
> columns which change frequently as join keys.
>
> All three of these implementation issues are, at least in theory,
> surmountable. For example, Sybase overcame problems (1) and (3) by
creating
> an automated, system-controlled hash key based on the table's real key.
This
> was a solution endorsed by E.F. Codd in the mid-90's when he came to
regret
> his promotion of the "Primary Key" idea in the SQL standard.
>
> Now, you're probably wondering "why does this guy regard surrogate keys as
a
> problem?" I'll tell you: I absolutely cannot count the number of "bad
> databases" I've encountered which contained tables with a surrogate key,
and
> NO REAL KEY of any kind. This makes data normalization impossible, and
> cleanup of the database becomes a labor-intensive process requiring
> hand-examination of each row.
>
> --
> -Josh Berkus
> Aglio Database Solutions
> San Francisco
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: surrogate key or not?
am 22.07.2004 14:20:40 von Karsten.Hilbert
Josh,
I reckon you are the one in the know so I'll take advantage of
that and ascertain myself of your advice.
I am the primary designer for the database schema of GnuMed
(www.gnumed.org) - a practice management application intended
to store medical data. Obviously we wouldn't want ambigous
data.
I have until now used surrogate primary keys on all table like
so:
create table diagnosis (
pk serial primary key,
fk_patient integer
not null
references patient(pk)
on update cascade
on delete cascade,
narrative text
not null,
unique(fk_patient, narrative)
);
Note that fk_patient would not do for a primary key since you
can have several diagnoses for a patient. However, the
combination of fk_patient and narrative would, as is implied
by the unique() constraint. For fear of having the real
primary key change due to business logic changes I have
resorted to the surrogate key.
Short question: Is this OK re your concerns for using
surrogates, eg. using a surrogate but making sure that at any
one time there *would* be a real primary key candidate ?
This would amount to:
> Streets
> ID Street Name Location
> 345 Green Street West Side of City
> 2019 Green Street In Front of Consulate
> 5781 Green Street Shortest in Town
Key: ID
UNIQUE: Key, Location
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Re: surrogate key or not?
am 22.07.2004 19:05:59 von Josh
Kasten,
> I have until now used surrogate primary keys on all table like
> so:
> Short question: Is this OK re your concerns for using
> surrogates, eg. using a surrogate but making sure that at any
> one time there *would* be a real primary key candidate ?
Yes, this is an example of that. You have a long text field as part of the
key, and that would kill you performance-wise if diagnosis was referred to in
other tables and joined in views.
Keep in mind, though, that if a table is an "end node"; that is, if its PK is
not used as an FK by any other table, then worries about the performance of
keys and size of indexes are unfounded. In fact, for such tables, the
surrogate key is a performance drag; it adds a column and an index which are
not needed.
Now, addressing your table, I would have concerns other than the use of
primary keys. I suggest humbly that your data model/business logic may
need some development:
create table diagnosis (
pk serial primary key,
fk_patient integer
not null
references patient(pk)
on update cascade
on delete cascade,
narrative text
not null,
unique(fk_patient, narrative)
);
This was obviously created so that a patient could have multiple diagnoses.
However, there is no information in the table to indicate *why* there are
multiple diagnoses. And you are using a real key based on a long text
field; always hazardous, as there are many ways to phrase the same
information and duplication is likely. To do it in english, your postulates
look like:
PATIENT 67 was given a diagnosis of WATER ON THE KNEE.
PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA.
But this is a bit sketchy. Who made these diagnoses? When did they make
them? Why? This table could carry a *lot* more information, and should (sql
is shorthand)
Then your postulates become *much* more informative:
PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3
in response to NOT BEING ABLE TO WALK of WATER ON THE KNEE
PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192
in response to THE CREEPY-CRAWLIES of ACUTE HYPOCHONDRIA
It also allows you to establish a much more useful key; it's reasonable to
expect that a single staff member on one visit in response to one complaint
would only give one diagnosis. Otherwise, you have more than database
problems. And it prevents you from having to rely on a flaky long text key.
--
Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
Re: surrogate key or not?
am 23.07.2004 07:16:50 von Sad
I want to add some notes
> create table diagnosis (
> pk serial primary key,
> fk_patient integer
> not null
> references patient(pk)
> on update cascade
> on delete cascade,
> narrative text
> not null,
> unique(fk_patient, narrative)
> );
1) a sequence generates INT8 values (in general) and you have INT4 field to=
=20
refer to a serial field.
2) narrative TEXT is very bad to check uniquness becauce it is non-formal=
=20
human-generated native-language text. so it contains missprints, it may be=
=20
rephrased many ways with the same meaning.
3) afaik a diagnosis doesn't belong to a patient,
it belongs to a History, and History is marked with a date and status and=
=20
belongs to a patient.
do not treat my words as The Truth.=20
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 23.07.2004 08:57:21 von Karsten.Hilbert
Josh, sad,
> create table diagnosis (
> pk serial primary key,
> fk_patient integer
> not null
> references patient(pk)
> on update cascade
> on delete cascade,
> narrative text
> not null,
> unique(fk_patient, narrative)
> );
>
> This was obviously created so that a patient could have multiple diagnoses.
> However, there is no information in the table to indicate *why* there are
> multiple diagnoses.
Because there is no information to be had on this fact. The
patient IS afflicted by such or she is not. There is no why.
> And you are using a real key based on a long text
> field;
Yes, but for various concerns I am not using it as the primary
key, just making sure it is unique. I was just trying to
ascertain myself that this is OK to do from a database insider
point of view.
> always hazardous, as there are many ways to phrase the same
> information and duplication is likely.
But that is at the discreetion of the user/doctor and nothing
that can be enforced at the DB level (no, don't start thinking
about coding systems/classifications).
> To do it in english, your postulates look like:
>
> PATIENT 67 was given a diagnosis of WATER ON THE KNEE.
> PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA.
Hm, I don't see anything wrong with that (I'm a doctor). The
plain information that Patient 456 is known to have suffered
bouts of ACUTE HYPOCHONDRIA is invaluable when dealing with an
agitated, psychically decompensated, hyperventilating patient
456.
> But this is a bit sketchy. Who made these diagnoses?
I may or may not care. Our actual schema does, of course,
carry that information.
> When did they make them?
We'd be happy if we always knew.
>Why?
That's of marginal concern, actually, and the answer just
flows from the narrative of the medical record. But even if
there's no narrative there the "fact" alone helps.
> create table diagnosis (
> pk serial primary key,
> fk_patient integer references patient(pk),
> fk_visit integer references visits(pk),
> fk_complaint integer references complaints(pk)
Nope, this doesn't belong here AT ALL from a medical point of
view. Diagnoses and complaints don't have any rational
relationship. This is life.
> fk_staff integer references medical_staff(pk)
> narrative text,
> unique(fk_patient, fk_visit, fk_complaint, fk_staff)
> );
And in fact our real tables ARE pretty much like that :-)
> PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3
> in response to NOT BEING ABLE TO WALK of WATER ON THE KNEE
> PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192
> in response to THE CREEPY-CRAWLIES of ACUTE HYPOCHONDRIA
That'd by a psychosis ;-)
> It also allows you to establish a much more useful key; it's reasonable to
> expect that a single staff member on one visit in response to one complaint
> would only give one diagnosis.
Entirely false and a possible sign of inappropriate care.
> Otherwise, you have more than database
> problems. And it prevents you from having to rely on a flaky long text key.
Flaky long text is what kept people reasonably well in health
for the last, what, five thousand years ? I rely on it
countless times every single day.
BTW, our full schema is here:
http://www.hherb.com/gnumed/schema/
Lot's of it isn't in the state yet where we want it but we are
getting there - or so I think.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Re: surrogate key or not?
am 23.07.2004 09:57:46 von lawgon
On Friday 23 July 2004 12:27 pm, Karsten Hilbert wrote:
> BTW, our full schema is here:
>
> http://www.hherb.com/gnumed/schema/
i was looking at your schema. i'm not a database pro, but in master tables i
see you have made the serial id as the primary key. i do it this way:
id serial unique
name varchar(25) not null
primary key is name - after all, you are going to search this on name arent
you? or is there some advantage in doing it your way?
also, how did you get that neatly formatted output of the schema?
--
regards
kg
http://www.onlineindianhotels.net - hotel bookings reservations in over 4600
hotels in India
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: surrogate key or not?
am 23.07.2004 11:00:10 von lawgon
On Friday 23 July 2004 03:29 pm, Michael Glaesemann wrote:
>
> You appear to be misunderstanding the purpose of a primary key. A
> primary key is used to ensure there is a way to identify each row
> uniquely. It is quite independent of which columns you may or may not
> want to search on. If name is not going to be necessarily unique in the
> table, it isn't a primary key.
ive not misunderstood anything. this is one of the tables in question:
address_type
id serial PRIMARY KEY
name text UNIQUE NOT NULL
i think it is self explanatory
--
regards
kg
http://www.onlineindianhotels.net - hotel bookings reservations in over 4600
hotels in India
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: surrogate key or not?
am 23.07.2004 11:59:33 von grzm
On Jul 23, 2004, at 4:57 PM, Kenneth Gonsalves wrote:
> On Friday 23 July 2004 12:27 pm, Karsten Hilbert wrote:
>
>> BTW, our full schema is here:
>>
>> http://www.hherb.com/gnumed/schema/
>
> i was looking at your schema. i'm not a database pro, but in master
> tables i
> see you have made the serial id as the primary key. i do it this way:
>
> id serial unique
> name varchar(25) not null
> primary key is name - after all, you are going to search this on name
> arent
> you? or is there some advantage in doing it your way?
Kenneth,
You appear to be misunderstanding the purpose of a primary key. A
primary key is used to ensure there is a way to identify each row
uniquely. It is quite independent of which columns you may or may not
want to search on. If name is not going to be necessarily unique in the
table, it isn't a primary key.
Michael Glaesemann
grzm myrealbox com
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Re: surrogate key or not?
am 23.07.2004 12:02:53 von lawgon
On Friday 23 July 2004 04:47 pm, Michael Glaesemann wrote:
> > id serial unique
> > name varchar(25) not null
> > primary key is name - after all, you are going to search this on name
> > arent
> > you? or is there some advantage in doing it your way?
>
> Also, your explanation "after all, you are going to search..." did not
> mention row uniqueness at all. Sorry if this is not what you meant, but
> I can only go by what you've written.
ok, i'll rephrase the whole thing:
i have a master table with two fields:
id serial unique
name varchar not null (and will be unique)
i always make 'name' the primary key, and since it is the primary key, i dont
explicitly specify it as unique, and after postgres 7.3 came out, i have
added the 'unique' constraint to the 'id'
on looking at the gnumed schema, i saw that although 'name' was unique, the
serial key, 'id' was made the primary key. So i wondered why and whether
there were advantages in doing it this way.
--
regards
kg
http://www.onlineindianhotels.net - hotel bookings reservations in over 4600
hotels in India
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
Re: surrogate key or not?
am 23.07.2004 13:17:37 von grzm
On Jul 23, 2004, at 6:00 PM, Kenneth Gonsalves wrote:
> On Friday 23 July 2004 03:29 pm, Michael Glaesemann wrote:
>
>>
>> You appear to be misunderstanding the purpose of a primary key. A
>> primary key is used to ensure there is a way to identify each row
>> uniquely. It is quite independent of which columns you may or may not
>> want to search on. If name is not going to be necessarily unique in
>> the
>> table, it isn't a primary key.
>
> ive not misunderstood anything. this is one of the tables in question:
>
> address_type
> id serial PRIMARY KEY
> name text UNIQUE NOT NULL
>
> i think it is self explanatory
In the example you originally gave, there is no indication of name
being a primary key:
On Friday 23 July 2004 12:27 pm, Karsten Hilbert wrote:
> id serial unique
> name varchar(25) not null
> primary key is name - after all, you are going to search this on name
> arent
> you? or is there some advantage in doing it your way?
Also, your explanation "after all, you are going to search..." did not
mention row uniqueness at all. Sorry if this is not what you meant, but
I can only go by what you've written.
Michael Glaesemann
grzm myrealbox com
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Re: surrogate key or not?
am 23.07.2004 14:25:05 von achill
Regarding natural or surrogate keys...
It is often possible that a table definition does not depict
reality, meaning that the specification given at table design
phase was wrong, (or was later proved wrong).
I had a table "parts" like
\d parts
Table "public.parts"
Column | Type | Modifiers
-----------------+-----------------------+------------------ -------------------------------------
partno | character varying(20) | not null
machtypeclsid | integer | not null
partclsid | integer | not null
picture | bytea |
instructions | bytea |
..............
Indexes:
"parts_ukey" primary key, btree (partno, machtypeclsid)
Foreign-key constraints:
"$2" FOREIGN KEY (partclsid) REFERENCES partclasses(id)
"$1" FOREIGN KEY (machtypeclsid) REFERENCES machtypesclasses(id)
Initially i was told, and found pretty natural, that a machine type
and a part number would fully identify a part.
Only to find out later, after a bunch of apps was written, that some
makers, described, in the same machine type,
different parts (on differnt drawing numbers i.e. pages in maker's manual)
with the same part numbers.
The irony here, is that this was the only instance of natural primary keys
in my whole schema.
I had then to write a script to convert the table itself, as well as
children tables to it, to the new schema.
Now the table looks like:
\d parts
Table "public.parts"
Column | Type | Modifiers
-----------------+-----------------------+------------------ -------------------------------------
partno | character varying(20) | not null
machtypeclsid | integer | not null
partclsid | integer | not null
picture | bytea |
instructions | bytea |
...........
id | integer | not null default
nextval('public.parts_id_seq'::text)
drgno | character varying(20) |
Indexes:
"parts_pkey" primary key, btree (id)
"parts_ukey" unique, btree (drgno, partno, machtypeclsid)
Foreign-key constraints:
"$2" FOREIGN KEY (partclsid) REFERENCES partclasses(id)
"$1" FOREIGN KEY (machtypeclsid) REFERENCES machtypesclasses(id)
So, now if they decide, that apart from drgno (drawing number),
a new field "revision" must be added, that will be needed to identify
"uniquely" the part, along with drgno,partno,machtypeclsid,
it wouldn't mean a lot of changes.
In other words, with surrogate keys, you eliminate the chance
that your original design was flawed due to lack of important
initial knowledge.
A designer in the majority of cases, gets his input from people
of the subject being modeled.
Often these people fail to have the whole image described
when giving specs, hence all the trouble.
--
-Achilleus
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: surrogate key or not?
am 23.07.2004 15:59:38 von vygen
Am Freitag, 23. Juli 2004 12:02 schrieb Kenneth Gonsalves:
> ok, i'll rephrase the whole thing:
>
> i have a master table with two fields:
> id serial unique
> name varchar not null (and will be unique)
>
> i always make 'name' the primary key, and since it is the primary key, i
> dont explicitly specify it as unique, and after postgres 7.3 came out, i
> have added the 'unique' constraint to the 'id'
>
> on looking at the gnumed schema, i saw that although 'name' was unique, the
> serial key, 'id' was made the primary key. So i wondered why and whether
> there were advantages in doing it this way.
Does your question relates to surrogate vs natural keys discussion?
I made some researches a few months ago and read a lot including:
http://www.intelligententerprise.com/print_article_flat.jhtm l?article=/030320/605celko1_1.jhtml
http://www.winnetmag.com/Articles/Print.cfm?ArticleID=5113
http://www.dbpd.com/vault/9805xtra.htm
http://www.sapior.com/Resources/Surrogate_Keys/surrogate_key s.html
http://www.bits-pilani.ac.in/dlp-home/online/studymat/sszg51 5/lecture3.htm
http://www.bcarter.com/intsurr1.htm
i decided to use natural keys wherever possible and i have many primary keys
spanning up to 4 attributes. And it works really fine. Performance is great,
the schema is easy to use and i am so glad to use the natural key approach.
Writing SQL queries and php code is much easier!
By now i try to avoid surrogate keys (like with SERIALs datatype) wherever
possible. Most articles advocate surrogate keys and at first it looks like an
advantage in a web environment because selecting and transmitting a
multi-column primary key in a form field ist very difficult.
Imagine a
Re: surrogate key or not?
am 23.07.2004 16:07:48 von tgl
Michael Glaesemann writes:
> You appear to be misunderstanding the purpose of a primary key. A
> primary key is used to ensure there is a way to identify each row
> uniquely. It is quite independent of which columns you may or may not
> want to search on. If name is not going to be necessarily unique in the
> table, it isn't a primary key.
The other standard reason for using a made-up value as primary key is
that it's under your control and you can guarantee it isn't going to
change: one record will have the same primary key for its entire life,
which vastly simplifies handling of foreign keys and so forth.
If you use any real-world datum as primary key you necessarily buy into
tracking changes in that real-world value. And handling duplicates.
Names are certainly not unique, and they do change (marriage, etc) and
people do miskey them and then expect to be able to fix the error later.
As Achilleus' nearby story shows, you can have these problems (certainly
the misentry part) even with imported data that is allegedly someone
else's primary key; part numbers, USA social-security numbers, etc.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 23.07.2004 19:17:11 von Josh
Achilleus,
> In other words, with surrogate keys, you eliminate the chance
> that your original design was flawed due to lack of important
> initial knowledge.
Well, you don't *eliminate* it, but you do decrease it.
I'd say, yes, this is an important 4th reason:
4) Your spec may be incorrect and surrogate keys make it easier to make design
changes in production.
Once again, though, this is an *implementation* issue and not a *logic* issue,
as I asserted ...
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: surrogate key or not?
am 23.07.2004 19:20:59 von Josh
Karsten,
> Because there is no information to be had on this fact. The
> patient IS afflicted by such or she is not. There is no why.
I begin to see why I spent $2000 this spring to have a doctor tell me what I
*didn't* have ...
> http://www.hherb.com/gnumed/schema/
>
> Lot's of it isn't in the state yet where we want it but we are
> getting there - or so I think.
When I have time, sure! But, this afternoon I am off to OSCON so I won't
have a chance for 2 weeks at least. Drop me a personal e-mail in August so I
don't forget.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Re: surrogate key or not?
am 23.07.2004 22:36:23 von Karsten.Hilbert
Josh,
> > In other words, with surrogate keys, you eliminate the chance
> > that your original design was flawed due to lack of important
> > initial knowledge.
>
> Well, you don't *eliminate* it, but you do decrease it.
>
> I'd say, yes, this is an important 4th reason:
>
> 4) Your spec may be incorrect and surrogate keys make it easier to make design
> changes in production.
thanks so much - this is exactly the kind of assurance I was
looking for.
BTW, I will come back to your generous offer in August.
Thanks again.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Re: surrogate key or not?
am 24.07.2004 11:06:35 von Karsten.Hilbert
On Fri, Jul 23, 2004 at 10:07:48AM -0400, Tom Lane wrote:
> The other standard reason for using a made-up value as primary key is
> that it's under your control and you can guarantee it isn't going to
> change: one record will have the same primary key for its entire life,
> which vastly simplifies handling of foreign keys and so forth.
That is pretty much the main reason why our schema does so.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 24.07.2004 16:08:21 von lists
Josh Berkus wrote:
> Given: Surrogate keys, by definition, represent no real data;
> Given: Only items which represent real data have any place in
> a data model
> Conclusion: Surrogate keys have no place in the data model
But, once a surrogate key is assigned to a row, doesn't it become a
"real" data? For example, I have a bunch of invoices/receipts and I
write down a unique number on each of them. Doesn't the unique number
become part of the information contained by the invoice/receipt itself
(at least as long as I'm concerned)? Another example is when Adam
(+Eve?) named each animal in the Genesis. At that time the name he gave
for each animal was arbitrary ("surrogate"), but once assigned to each
species, it becomes part of the real data.
> 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and
Change management IMO is perhaps the main reason of surrogate/artificial
key. We often need a PK that _never_ needs to change (because it can be
a royal PITA or downright impossibility to make this change; the PK
might already be printed on a form/card/document, recorded on some
permanent database, tattoed/embedded in someone's forehead, etc).
Meanwhile, every other aspect of the data can change (e.g. a person can
change his name, sex, age, email, address, even date & place of birth).
Not to mention data entry mistakes. So it's impossible to use any
"real"/natural key in this case.
> Now, you're probably wondering "why does this guy regard surrogate keys as a
> problem?" I'll tell you: I absolutely cannot count the number of "bad
> databases" I've encountered which contained tables with a surrogate key, and
> NO REAL KEY of any kind. This makes data normalization impossible, and
> cleanup of the database becomes a labor-intensive process requiring
> hand-examination of each row.
Okay, so surrogate key makes it easy for stupid people to design a
database that is prone to data duplication (because he doesn't install
enough unique constraints to prevent this). But I don't see how a
relation with a surrogate key is harder to "normalize" (that is, for the
duplicates to be removed) than a relation with no key at all. Compare:
street
------
Green Street
Green Street
Green Street
versus:
id street
-- ------
2934 Green Street
4555 Green Street
5708 Green Street
They both contain duplicates and/or ambiguous data. They're both equally
hard to normalize/clean.
--
dave
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Re: surrogate key or not?
am 26.07.2004 12:22:40 von Iain
Hi,
> But, once a surrogate key is assigned to a row, doesn't it become a
> "real" data? For example, I have a bunch of invoices/receipts and I
> write down a unique number on each of them. Doesn't the unique number
> become part of the information contained by the invoice/receipt itself
> (at least as long as I'm concerned)?
I don't think that I'd call an invoice number a surrogate key in the first
place. Invoice numbers and their like come from business requirements, they
just happen to be highly suitable as PKs so they could be considered a
natural key.
> > 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes
and
>
> Change management IMO is perhaps the main reason of surrogate/artificial
> key. We often need a PK that _never_ needs to change (because it can be
> a royal PITA or downright impossibility to make this change; the PK
> might already be printed on a form/card/document, recorded on some
> permanent database, tattoed/embedded in someone's forehead, etc).
> Meanwhile, every other aspect of the data can change (e.g. a person can
> change his name, sex, age, email, address, even date & place of birth).
> Not to mention data entry mistakes. So it's impossible to use any
> "real"/natural key in this case.
I've never felt that it's a good idea to be dependent on the DBMS providing
cascade functionality - particularly cascading updates to PKs. I don't think
I've ever worked on a DB that used such constraints.
> Okay, so surrogate key makes it easy for stupid people to design a
> database that is prone to data duplication (because he doesn't install
> enough unique constraints to prevent this). But I don't see how a
> relation with a surrogate key is harder to "normalize" (that is, for the
> duplicates to be removed) than a relation with no key at all. Compare:
I think that most of us would agree that whatever they are, surrogate keys
aren't a substitute for clean data (or quality data).
On the whole, I think that there are more important indicators of quality
(or lack of it) in your database design than the prevalence (or lack) of
numeric ID style keys. Personally, I've grown to appreciate the id approach
over the years, but my mind is always open to other ideas.
regards
iain
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: surrogate key or not?
am 26.07.2004 12:38:51 von Iain
> > Because there is no information to be had on this fact. The
> > patient IS afflicted by such or she is not. There is no why.
>
> I begin to see why I spent $2000 this spring to have a doctor tell me what
I
> *didn't* have ...
This reminds me of a project I worked on many years ago, I was pretty much
fresh out of university writing a system for a large medical practice -
itwas principally for accounting purposes. I made lots of suggestions like
Josh's, only to get replies like Karsten's. I the progammer wanted to codify
everything so as to enable data analysis (ie linking complaints and
diagnosis, etc) but the doctors wern't interested. They just wanted to write
free text comments. And the reason for it (as far as I can tell) is the
distinction between general practice and reseach (such as epidemiology). So
(GPs) are not so much searching for new knowlege in their patients records,
as applying the knowlege gained from research (done by researchers) to treat
individual patients.
Karsten might beg to differ, I don't know, but the "long flaky text" comment
triggered some old (and fond) memories ;-)
Cheers
iain
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: surrogate key or not?
am 26.07.2004 13:13:37 von Sad
hello
> This reminds me of a project I worked on many years ago, I was pretty much
> fresh out of university writing a system for a large medical practice -
> itwas principally for accounting purposes. I made lots of suggestions like
> Josh's, only to get replies like Karsten's. I the progammer wanted to
> codify everything so as to enable data analysis (ie linking complaints and
> diagnosis, etc) but the doctors wern't interested. They just wanted to
> write free text comments. And the reason for it (as far as I can tell) is
> the distinction between general practice and reseach (such as
> epidemiology). So (GPs) are not so much searching for new knowlege in the=
ir
> patients records, as applying the knowlege gained from research (done by
> researchers) to treat individual patients.
Here the situation quite similar, a customer dictate drives the practice fa=
r=20
far from logic.
EVERY database i had desined in few years have been turned into a container=
of=20
textual MEMOs completely unstructured. (because of patches and makeups)
USER DO NOT WANT TO TAKE CARE OF THEIR INPUT.
i do not know how are you all programming, really hope you are in defferent=
=20
circumstances. Here a customer itself is a user and itself is a [man who=20
formulate a problem] then every stupid idea to simplify input immediately=
=20
turns to an urgent official order to do.
Sorry. That is my work.
Thank you all again for very usefull discussion on Surrogate Keys.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: surrogate key or not?
am 26.07.2004 22:34:41 von Karsten.Hilbert
> This reminds me of a project I worked on many years ago, I was pretty much
> fresh out of university writing a system for a large medical practice -
> itwas principally for accounting purposes. I made lots of suggestions like
> Josh's, only to get replies like Karsten's. I the progammer wanted to codify
> everything so as to enable data analysis (ie linking complaints and
> diagnosis, etc) but the doctors wern't interested.
Likely it wasn't their intent with the system. That doesn't
mean it can't be done ... (not that I think that with todays
tools it sufficiently can but ... :-> ICPC probably comes closest
to that). Anyway, I am highly interested in increasing the data
quality in my records. However, to be able to care for a
patient *when needed* I won't allow software I write to
*force* coding upon me. Trust me I have and am still
researching coding, classifying, structuring of medical data
at an ongoing basis which funnily constantly improves my daily
abilities as a doctor. Now, back to GnuMed, we *do* allow to
code arbitrary pieces of narrative with arbitrarily many codes
from arbitrarily many coding systems. Same for classifying
(rather typing) data.
> They just wanted to write free text comments. And the reason for
> it (as far as I can tell) is the distinction between
> general practice and reseach (such as epidemiology).
Rather it is the difference between reality and theory. In
reality you are dealing with tens of patients with 1-5
problems almost neither of which are "quite right" if you go
by the textbooks. Nevertheless everyone expects you'll never
forget/do wrong a thing.
> So (GPs) are not so much searching for new knowlege in their patients records,
> as applying the knowlege gained from research (done by researchers) to treat
> individual patients.
One thing we ARE looking for in our records is the ability to
find groups of patients by arbitrary criteria since one day
I'll have to find all my patients whose father took a statine,
whose second-born child suffered a bout of neutropenia 2 weeks
after birth and who started being on the pill at age 14.
Because they'll have a 3fold increased risk of lung embolus.
Unless monitored for clotting factors every 6 months. Which I
will have to do from now on. Get my point ? :-)
> Karsten might beg to differ, I don't know, but the "long flaky text" comment
> triggered some old (and fond) memories ;-)
I totally understand what you are saying... We do cling to old
trusted things. Also, I used a bit of hyperbole to get my
point across.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Re: surrogate key or not?
am 27.07.2004 08:03:50 von lawgon
On Tuesday 27 July 2004 02:04 am, Karsten Hilbert wrote:
> One thing we ARE looking for in our records is the ability to
> find groups of patients by arbitrary criteria since one day
> I'll have to find all my patients whose father took a statine,
> whose second-born child suffered a bout of neutropenia 2 weeks
> after birth and who started being on the pill at age 14.
> Because they'll have a 3fold increased risk of lung embolus.
> Unless monitored for clotting factors every 6 months. Which I
> will have to do from now on. Get my point ? :-)
couldnt you do something like, let them write the 'long flaky text', and at
the same time mark a certain number of key words or key phrases which could
be stored and retrieved?
--
regards
kg
http://www.onlineindianhotels.net - hotel bookings reservations in over 4600
hotels in India
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
> also, how did you get that neatly formatted output of the schema?
This is postgresql_autodoc: http://www.rbt.ca/autodoc/
--
Markus Bertheau
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 28.07.2004 03:29:19 von Iain
> > One thing we ARE looking for in our records is the ability to
> > find groups of patients by arbitrary criteria since one day
> > I'll have to find all my patients whose father took a statine,
> > whose second-born child suffered a bout of neutropenia 2 weeks
> > after birth and who started being on the pill at age 14.
> > Because they'll have a 3fold increased risk of lung embolus.
> > Unless monitored for clotting factors every 6 months. Which I
> > will have to do from now on. Get my point ? :-)
>
> couldnt you do something like, let them write the 'long flaky text', and
at
> the same time mark a certain number of key words or key phrases which
could
> be stored and retrieved?
I was thinking along similar lines. On one hand, you need the "long flaky
text" (love that expression), on the other, you want to ensure that you can
locate appropriate data, and that the required details are available. By
available, I mean that it was entered in the first place, and that it is
retrievable. I imagine a system whereby you define keywords and attributes
for them (attributes would be an episode date, or dosage, etc). The memo, is
checked for keywords and the doctor prompted to supply the attributes for
them. If your parsing was smart, and the memo formated a little, you could
conceivably pull a lot of this out of the memo as defaults. The processing
could also be done retrospectively by an intern or researcher, but I imagine
it would be best to have the doctor do it at the time.
Just some vague ideas anyway. This may of course be much more work than
anyone wants to get into... I don't have much experience with text searching
systems, but something reasonably sophisticated would probably get you
there.
Regards
Iain
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: surrogate key or not?
am 28.07.2004 04:14:50 von lawgon
On Wednesday 28 July 2004 06:59 am, Iain wrote:
> > couldnt you do something like, let them write the 'long flaky text', and
>
> at
>
> > the same time mark a certain number of key words or key phrases which
>
> could
>
> > be stored and retrieved?
>
> I was thinking along similar lines. On one hand, you need the "long flaky
> text" (love that expression), on the other, you want to ensure that you can
> locate appropriate data, and that the required details are available. By
> available, I mean that it was entered in the first place, and that it is
> retrievable. I imagine a system whereby you define keywords and attributes
> for them (attributes would be an episode date, or dosage, etc). The memo,
> is checked for keywords and the doctor prompted to supply the attributes
> for them. If your parsing was smart, and the memo formated a little, you
> could conceivably pull a lot of this out of the memo as defaults. The
> processing could also be done retrospectively by an intern or researcher,
> but I imagine it would be best to have the doctor do it at the time.
simpler, as a first stage and easily implemented, give him some way he can
tag words and phrases he feels important. save these in a table along with a
foreign key identifying the source. as a second stage keep analysing the
words and phrases chosen and empirically build up a database of significant
words and phrases relevant to that specific installation (or doctor), and as
a third stage, highlight these as he types in the data
--
regards
kg
http://www.onlineindianhotels.net - hotel bookings reservations in over 4600
hotels in India
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: surrogate key or not?
am 28.07.2004 14:04:02 von Karsten.Hilbert
> retrievable. I imagine a system whereby you define keywords and attributes
> for them (attributes would be an episode date, or dosage, etc).
This is pretty much exactly what we are working on. We are
factoring out data into dedicated tables where that is
possible due to the structured nature of the data and store
the remaining narrative as TEXT with associated attributes of
all kinds.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Re: surrogate key or not?
am 28.07.2004 14:08:11 von Karsten.Hilbert
> simpler, as a first stage and easily implemented, give him some way he can
> tag words and phrases he feels important. save these in a table along with a
> foreign key identifying the source. as a second stage keep analysing the
> words and phrases chosen and empirically build up a database of significant
> words and phrases relevant to that specific installation (or doctor), and as
> a third stage, highlight these as he types in the data
Even simpler. Most notes only ever contain those words that are
relevant :-) So what we are doing is matching different parts
of the progress note, eg. when the doctor types "cough,
congested nose, sore throat" as a complaint we pull in the
drugs he ususally prescribes for that condition and the dosages
she ususally uses ...
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: surrogate key or not?
am 07.08.2004 00:42:04 von Josh
David,
> But, once a surrogate key is assigned to a row, doesn't it become a
> "real" data? For example, I have a bunch of invoices/receipts and I
> write down a unique number on each of them. Doesn't the unique number
> become part of the information contained by the invoice/receipt itself
> (at least as long as I'm concerned)?
Sure ... *if* it's being used that way. If, however, your table has that
Invoice # *and* a seperate surrogate key that's redundant and can cause
problems.
> Change management IMO is perhaps the main reason of surrogate/artificial
> key. We often need a PK that _never_ needs to change (because it can be
> a royal PITA or downright impossibility to make this change; the PK
> might already be printed on a form/card/document, recorded on some
> permanent database, tattoed/embedded in someone's forehead, etc).
Sure. But surrogate keys don't fix this problem; only good change management
does. This is precisely why I say "use with caution"; all too often project
leaders regard surrogate keys as a substitute for good change management and
don't do any further work.
> Meanwhile, every other aspect of the data can change (e.g. a person can
> change his name, sex, age, email, address, even date & place of birth).
> Not to mention data entry mistakes. So it's impossible to use any
> "real"/natural key in this case.
Absolutely false. It's quite possible, it's just a performance/schema/data
management issue. This also applies to my comment above.
> Okay, so surrogate key makes it easy for stupid people to design a
> database that is prone to data duplication (because he doesn't install
> enough unique constraints to prevent this). But I don't see how a
> relation with a surrogate key is harder to "normalize" (that is, for the
> duplicates to be removed) than a relation with no key at all. Compare:
You're right here, both are equally hard to normalize. What I'm criticizing
is the tendency of a lot of beginning DBAs -- and even some books on database
design -- to say: "If you've created an integer key, you're done."
Had I my way, I would automatically issue a WARNING on any time you create a
table in PG without a key.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 07.08.2004 06:00:30 von lawgon
On Saturday 07 August 2004 04:12 am, Josh Berkus wrote:
>
> > Meanwhile, every other aspect of the data can change (e.g. a person can
> > change his name, sex, age, email, address, even date & place of birth).
> > Not to mention data entry mistakes. So it's impossible to use any
> > "real"/natural key in this case.
>
> Absolutely false. It's quite possible, it's just a
> performance/schema/data management issue. This also applies to my comment
> above.
why shouldnt the primary key change? the only key that should never change is
a key that is used as a foreign key in another table. In a table like this:
id serial unique
name varchar primary key
name may change - id will never change. id is used as the foreign key
--
regards
kg
http://www.onlineindianhotels.net - fastest hotel search website in the world
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Re: surrogate key or not?
am 08.08.2004 02:29:37 von lawgon
On Sunday 08 August 2004 06:36 am, Oliver Elphick wrote:
> On Sat, 2004-08-07 at 05:00, Kenneth Gonsalves wrote:
> > why shouldnt the primary key change? the only key that should never
> > change is a key that is used as a foreign key in another table. In a
> > table like this:
> >
> > id serial unique
> > name varchar primary key
> >
> > name may change - id will never change. id is used as the foreign key
>
> It can change if you use ON UPDATE CASCADE in the foreign key
> definitions.
but why would anyone want to change the value of an autogenerated serial row?
--
regards
kg
http://www.onlineindianhotels.net - fastest hotel search website in the world
http://www.ootygolfclub.org
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: surrogate key or not?
am 08.08.2004 03:06:36 von Olly
On Sat, 2004-08-07 at 05:00, Kenneth Gonsalves wrote:
> why shouldnt the primary key change? the only key that should never change is
> a key that is used as a foreign key in another table. In a table like this:
>
> id serial unique
> name varchar primary key
>
> name may change - id will never change. id is used as the foreign key
It can change if you use ON UPDATE CASCADE in the foreign key
definitions.
--
Oliver Elphick olly@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"Be still before the LORD and wait patiently for him;
do not fret when men succeed in their ways, when they
carry out their wicked schemes."
Psalms 37:7
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Re: surrogate key or not?
am 09.08.2004 00:19:16 von Josh
Kenneth,
> but why would anyone want to change the value of an autogenerated serial
row?
But if you're using a real key, it may need to change. The only reason *not*
do do it that way is performance issues with CASCADE.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: surrogate key or not?
am 24.08.2004 06:31:57 von Sad
On Sunday 08 August 2004 04:29, Kenneth Gonsalves wrote:
> but why would anyone want to change the value of an autogenerated serial
> row?
because of catenating data from two or more databases.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend