What do you think about the checksum function ?

What do you think about the checksum function ?

am 29.05.2007 14:39:17 von Kurt

Hi,

I'd like advices about an idea I add to resolve a problem. thanks to
you in advance for yours answers.
I have a database with tables that I load with flat file. The size of
each table is 600 Mb. The flat file are the image of an application
and there is no updated date or created date on any table. So my
tables are just a copy of the data from the flat file.

Now I'd like to create an History Table. So I have to determine which
lines changed and which one did'nt.
As I don't have any date on my row the only answer I had unil know was
to check each column on each row to see if any data changed. If the
data changed I add a new line in my history date.

My idea is to add a checksum column in both table on all columns. To
know if any data change I just have to check my PK + my checksum
column.
Do you think that is a good idea ? Is checksum a quick function or
not ?.

Thanks.

--
K

Re: What do you think about the checksum function ?

am 29.05.2007 23:31:54 von Erland Sommarskog

Kurt (nicolas.agrapart@gmail.com) writes:
> I'd like advices about an idea I add to resolve a problem. thanks to
> you in advance for yours answers.
> I have a database with tables that I load with flat file. The size of
> each table is 600 Mb. The flat file are the image of an application
> and there is no updated date or created date on any table. So my
> tables are just a copy of the data from the flat file.
>
> Now I'd like to create an History Table. So I have to determine which
> lines changed and which one did'nt.
> As I don't have any date on my row the only answer I had unil know was
> to check each column on each row to see if any data changed. If the
> data changed I add a new line in my history date.
>
> My idea is to add a checksum column in both table on all columns. To
> know if any data change I just have to check my PK + my checksum
> column.
> Do you think that is a good idea ? Is checksum a quick function or
> not ?.

Neither checksum() nor binary_checksum() are very useful. I think they
based on XOR, and they would too often say a row is unchanged when it
has not. It would be a lot safer to compare all columns?

Exactly how do update your tables? To blow all existing data away and
reload, or do you INSERT new, update existing ones etc? In such case a
timestamp column could work for you. (Timestamp here has nothing to
do with date and time.)


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: What do you think about the checksum function ?

am 30.05.2007 08:45:46 von Kurt

On 29 mai, 23:31, Erland Sommarskog wrote:
> Kurt (nicolas.agrap...@gmail.com) writes:
> > I'd like advices about an idea I add to resolve a problem. thanks to
> > you in advance for yours answers.
> > I have a database with tables that I load with flat file. The size of
> > each table is 600 Mb. The flat file are the image of an application
> > and there is no updated date or created date on any table. So my
> > tables are just a copy of the data from the flat file.
>
> > Now I'd like to create an History Table. So I have to determine which
> > lines changed and which one did'nt.
> > As I don't have any date on my row the only answer I had unil know was
> > to check each column on each row to see if any data changed. If the
> > data changed I add a new line in my history date.
>
> > My idea is to add a checksum column in both table on all columns. To
> > know if any data change I just have to check my PK + my checksum
> > column.
> > Do you think that is a good idea ? Is checksum a quick function or
> > not ?.
>
> Neither checksum() nor binary_checksum() are very useful. I think they
> based on XOR, and they would too often say a row is unchanged when it
> has not. It would be a lot safer to compare all columns?


>
> Exactly how do update your tables? To blow all existing data away and
> reload, or do you INSERT new, update existing ones etc? In such case a
> timestamp column could work for you. (Timestamp here has nothing to
> do with date and time.)
>
> --
> Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
>
> Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodt=
echnol/sql/2005/downloads/books...
> Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/=
previousversions/books.mspx- Masquer le texte des messages pr=E9c=E9dents -
>
> - Afficher le texte des messages pr=E9c=E9dents -

Hello,

I read on the msdn website that BINARY_CHECKSUM can be used to detect
changes to a row of a table.
To update my current tables, I truncate and after I insert the new
datas.
What do you call a timestamp ? How timestamp could work for me ?

Thanks,

--
K

Re: What do you think about the checksum function ?

am 30.05.2007 23:57:45 von Erland Sommarskog

Kurt (nicolas.agrapart@gmail.com) writes:
> I read on the msdn website that BINARY_CHECKSUM can be used to detect
> changes to a row of a table.
> To update my current tables, I truncate and after I insert the new
> datas.
> What do you call a timestamp ? How timestamp could work for me ?

If you truncate and insert, timestamp is not going help you. A timestamp
column is automatically updated when a column is updated with a database-
unique monotonically increasing value. So if you had updated only rows that
had changed, and inserted new ones, you could have saved the current
timestamp value when you started, and then all rows with a higher timestamp
value would have been new.

But since you truncate and re-insert, all timestamp values will be higher
when you started.

As I recall binary_checksum() returns a 32-bit integer. If your tables
are wide, this means that collisions will not at all be unlikely. Note
also that binary_checksum ignores some data types entirely.

While more boring to code, I would recommend that you make a comparison
column by column.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx