CHECKSUM to determine record changes

CHECKSUM to determine record changes

am 22.04.2008 00:22:36 von rcamarda

I've searched the forum for uses of CHECKSUM and havent found a
satisfactory answer.
I need to know when a row changes and I dont care what it was or is, I
just need to know it changed.
Detail:
Hourly I select rows where the record as a create date or change date
of the last 3 days. I am only interested when the address or name
changes, not other columns.
Currently, I might select 30,000 rows that are new or changed, but
turns out that only 100 have address changes and 500 are new. I want
to process the 600, not the 30,000.
My main concern is if some columns change, but the resulting checksum
doesnt. Then I would have missed processing that record.
The column types that I will be tracking are:
"forenames" VARCHAR(50) NULL,
"surname" VARCHAR(51) NULL,
"ADDRESS1" VARCHAR(60) NULL,
"ADDRESS2" VARCHAR(60) NULL,
"ADDRESS4" VARCHAR(50) NULL,
"STATE" VARCHAR(10) NULL,
"ZIP" VARCHAR(20) NULL,
"email" VARCHAR(50) NULL,
"telephone" VARCHAR(10) NULL
(I left off the keys, these are the only fields that I will be using
for the checksum()).
Am I safe? I read about 32 bit CRC and some data changes would go
unnoticed, but not sure if this layout would qualifiy.

TIA
Rob

Re: CHECKSUM to determine record changes

am 22.04.2008 00:33:34 von rcamarda

Im using SQL Server 2005 currently patched.

Re: CHECKSUM to determine record changes

am 22.04.2008 08:25:56 von David Portas

"rcamarda" wrote in message
news:2d88cd48-2853-4f01-b797-c5c166bd167d@k13g2000hse.google groups.com...
> Im using SQL Server 2005 currently patched.

CHECKSUM isn't a reliable way to detect change because it's quite common to
find different rows with the same CHECKSUM value. You could use a ROWVERSION
column instead. ROWVERSION is guaranteed to increment when the row data is
updated.

Another alternative is to use a hash. The HashBytes function will return a
secure hash of a binary value with a very high probability of uniqueness.
Duplicate hashes are theoretically possible but are incredibly unlikely to
occur unintentionally. If you are extremely paranoid then you can use two
different hashes.

--
David Portas

Re: CHECKSUM to determine record changes

am 23.04.2008 18:42:38 von Gert-Jan Strik

David Portas wrote:
>
> "rcamarda" wrote in message
> news:2d88cd48-2853-4f01-b797-c5c166bd167d@k13g2000hse.google groups.com...
> > Im using SQL Server 2005 currently patched.
>
> CHECKSUM isn't a reliable way to detect change because it's quite common to
> find different rows with the same CHECKSUM value. You could use a ROWVERSION
> column instead. ROWVERSION is guaranteed to increment when the row data is
> updated.
>
> Another alternative is to use a hash. The HashBytes function will return a
> secure hash of a binary value with a very high probability of uniqueness.
> Duplicate hashes are theoretically possible but are incredibly unlikely to
> occur unintentionally. If you are extremely paranoid then you can use two
> different hashes.
>
> --
> David Portas

I agree about the advice for use ROWVERSION.

However, CHECKSUM is also just a hash. Whether the chance of an
unnotices change is lower if you use HashBytes function instead of
CHECKSUM depends on your data.

Although adding a second hash value will lower the chance of a missed
change, it would be silly to do so. In the end you would need as many
hash bytes as there are bytes in the data.

--
Gert-Jan