Speeding up multiple updates

Speeding up multiple updates

am 10.10.2006 16:15:59 von brian.lukoff

I have a table with three columns, ID, Field, and Data. There are
about 1000 rows with a given ID (and given an ID each row has a unique
Field value). At some point in a script, I need to update the Data
column of a subset of these rows (somewhere between 1 and maybe 30 of
them), but each updated Data value needs to be able to contain a
different value. Right now I'm using a sequence of UPDATE statements,
one for each row that I need to update, (e.g., UPDATE [...] SET [Data]
= ... WHERE [Field] = ... AND [ID] = ...) but this is very slow and is
causing users to receive timeouts and SQL deadlock errors. Is there a
better way for me to do this?

Brian

Re: Speeding up multiple updates

am 10.10.2006 16:39:46 von Anthony Jones

wrote in message
news:1160489759.072453.227640@c28g2000cwb.googlegroups.com.. .
> I have a table with three columns, ID, Field, and Data. There are
> about 1000 rows with a given ID (and given an ID each row has a unique
> Field value). At some point in a script, I need to update the Data
> column of a subset of these rows (somewhere between 1 and maybe 30 of
> them), but each updated Data value needs to be able to contain a
> different value. Right now I'm using a sequence of UPDATE statements,
> one for each row that I need to update, (e.g., UPDATE [...] SET [Data]
> = ... WHERE [Field] = ... AND [ID] = ...) but this is very slow and is
> causing users to receive timeouts and SQL deadlock errors. Is there a
> better way for me to do this?
>
> Brian
>

You didn't specify what DB.
How are you executing these updates? Are they in a transaction?

I can't see updating 30 records in a 1000 record DB taking any measurable
time at all. (Unless your Data field just happens to be a Large Binary type
field in which you are storing a lot of data).

Do you have triggers on the table that might be taking a long time?
Is the table actively being read as part of other transaction protected
operations?

Re: Speeding up multiple updates

am 10.10.2006 19:04:56 von brian.lukoff

Hi Anthony,

The table has many records, but for any particular ID there are about
1000 records each. There are probably about 300 IDs, so about 300,000
records in the database.

The UPDATEs are being sent through the Execute statement of a
connection object. There aren't any triggers. The Data field is a
large varchar column (size 5000, I think).

Brian

Anthony Jones wrote:
> wrote in message
> news:1160489759.072453.227640@c28g2000cwb.googlegroups.com.. .
> > I have a table with three columns, ID, Field, and Data. There are
> > about 1000 rows with a given ID (and given an ID each row has a unique
> > Field value). At some point in a script, I need to update the Data
> > column of a subset of these rows (somewhere between 1 and maybe 30 of
> > them), but each updated Data value needs to be able to contain a
> > different value. Right now I'm using a sequence of UPDATE statements,
> > one for each row that I need to update, (e.g., UPDATE [...] SET [Data]
> > = ... WHERE [Field] = ... AND [ID] = ...) but this is very slow and is
> > causing users to receive timeouts and SQL deadlock errors. Is there a
> > better way for me to do this?
> >
> > Brian
> >
>
> You didn't specify what DB.
> How are you executing these updates? Are they in a transaction?
>
> I can't see updating 30 records in a 1000 record DB taking any measurable
> time at all. (Unless your Data field just happens to be a Large Binary type
> field in which you are storing a lot of data).
>
> Do you have triggers on the table that might be taking a long time?
> Is the table actively being read as part of other transaction protected
> operations?

Re: Speeding up multiple updates

am 10.10.2006 19:45:08 von reb01501

brian.lukoff@gmail.com wrote:
> I have a table with three columns, ID, Field, and Data. There are
> about 1000 rows with a given ID (and given an ID each row has a unique
> Field value). At some point in a script, I need to update the Data
> column of a subset of these rows (somewhere between 1 and maybe 30 of
> them), but each updated Data value needs to be able to contain a
> different value. Right now I'm using a sequence of UPDATE statements,
> one for each row that I need to update, (e.g., UPDATE [...] SET [Data]
> = ... WHERE [Field] = ... AND [ID] = ...) but this is very slow and is
> causing users to receive timeouts and SQL deadlock errors. Is there a
> better way for me to do this?
>
By "SQL", I'm assuming you mean SQL Server ...
Check out Erland's article about handling arrays in T-SQL*:
http://www.sommarskog.se/arrays-in-sql.html

You may be able to improve performance by passing all the updates as a
delimited string to a stored procedure which will parse the string,
inserting the data into a temp table and then using a join to perform
the actual updates.

Bob Barrows
* make sure you check out the other articles on his site. I have learned
from all of them.
--
Microsoft MVP -- ASP/ASP.NET
Please reply to the newsgroup. The email account listed in my From
header is my spam trap, so I don't check it very often. You will get a
quicker response by posting to the newsgroup.

Re: Speeding up multiple updates

am 10.10.2006 19:56:11 von brian.lukoff

Would it be better to redesign the database table so that each user has
only one row and the Data column is just a comma-delimited version of
*all* of their data: e.g.,

4,5,2,a text response,another text response,2,3

and then just parse into and out of this format in code? The Data
column would have to be a text column then, because it would end up
being very long. But it would mean that there would only be one row
per user (and one UPDATE statement) instead of many.

Brian

Bob Barrows [MVP] wrote:
> brian.lukoff@gmail.com wrote:
> > I have a table with three columns, ID, Field, and Data. There are
> > about 1000 rows with a given ID (and given an ID each row has a unique
> > Field value). At some point in a script, I need to update the Data
> > column of a subset of these rows (somewhere between 1 and maybe 30 of
> > them), but each updated Data value needs to be able to contain a
> > different value. Right now I'm using a sequence of UPDATE statements,
> > one for each row that I need to update, (e.g., UPDATE [...] SET [Data]
> > = ... WHERE [Field] = ... AND [ID] = ...) but this is very slow and is
> > causing users to receive timeouts and SQL deadlock errors. Is there a
> > better way for me to do this?
> >
> By "SQL", I'm assuming you mean SQL Server ...
> Check out Erland's article about handling arrays in T-SQL*:
> http://www.sommarskog.se/arrays-in-sql.html
>
> You may be able to improve performance by passing all the updates as a
> delimited string to a stored procedure which will parse the string,
> inserting the data into a temp table and then using a join to perform
> the actual updates.
>
> Bob Barrows
> * make sure you check out the other articles on his site. I have learned
> from all of them.
> --
> Microsoft MVP -- ASP/ASP.NET
> Please reply to the newsgroup. The email account listed in my From
> header is my spam trap, so I don't check it very often. You will get a
> quicker response by posting to the newsgroup.

Re: Speeding up multiple updates

am 10.10.2006 20:05:27 von reb01501

brian wrote:
> Would it be better to redesign the database table so that each user
> has only one row and the Data column is just a comma-delimited
> version of *all* of their data: e.g.,

Absolutely not. It's not the worst thing you could do, but it's close.
Almost worse than creating multiple columns to hold the individual data
values.
>
> 4,5,2,a text response,another text response,2,3
>
> and then just parse into and out of this format in code? The Data
> column would have to be a text column then, because it would end up
> being very long. But it would mean that there would only be one row
> per user (and one UPDATE statement) instead of many.
>
My solution involves a single UPDATE statement as well. And data
integrity is not compromised.

--
Microsoft MVP -- ASP/ASP.NET
Please reply to the newsgroup. The email account listed in my From
header is my spam trap, so I don't check it very often. You will get a
quicker response by posting to the newsgroup.

Re: Speeding up multiple updates

am 10.10.2006 21:24:43 von brian.lukoff

Bob Barrows [MVP] wrote:
> Absolutely not. It's not the worst thing you could do, but it's close.
> Almost worse than creating multiple columns to hold the individual data
> values.

> My solution involves a single UPDATE statement as well. And data
> integrity is not compromised.

Why is this so bad? There is no validation done on the values (the
user enters the data), so I don't think data integrity would be a
problem. (Unless I'm not understanding what you mean.) Also, having
one column with delimited values seems simpler and would seem to take
less time on the database end since the ASP code would put together
(and take apart) the delimited list of values. It's not an elegant
solution, but it seems like it should work and alleviate the problems I
have been experiencing with timeouts, locking, and cascading blocking.

Brian

Re: Speeding up multiple updates

am 10.10.2006 21:35:03 von reb01501

brian wrote:
> Bob Barrows [MVP] wrote:
>> Absolutely not. It's not the worst thing you could do, but it's
>> close. Almost worse than creating multiple columns to hold the
>> individual data values.
>
>> My solution involves a single UPDATE statement as well. And data
>> integrity is not compromised.
>
> Why is this so bad?

I'm not going to attempt to answer this here because:
1. A newsgroup is not the proper forum to teach relational database
design.
2. I am not an expert - I just know enough to realize what a bad idea it
is to put delimited data values in a database table.

I suggest you get a book or do a google search.

It may seem like a good solution for your immediate problem, but I can
almost guarantee it will lead to problems down the road.

Again, my proposed solution involved a single update to your table and
avoided the problems inherent in storing delimited data.

--
Microsoft MVP -- ASP/ASP.NET
Please reply to the newsgroup. The email account listed in my From
header is my spam trap, so I don't check it very often. You will get a
quicker response by posting to the newsgroup.

Re: Speeding up multiple updates

am 10.10.2006 21:55:45 von Mike Brind

"brian" wrote in message
news:1160508283.639404.244130@i42g2000cwa.googlegroups.com.. .
> Bob Barrows [MVP] wrote:
>> Absolutely not. It's not the worst thing you could do, but it's close.
>> Almost worse than creating multiple columns to hold the individual data
>> values.
>
>> My solution involves a single UPDATE statement as well. And data
>> integrity is not compromised.
>
> Why is this so bad? There is no validation done on the values (the
> user enters the data), so I don't think data integrity would be a
> problem. (Unless I'm not understanding what you mean.) Also, having
> one column with delimited values seems simpler and would seem to take
> less time on the database end since the ASP code would put together
> (and take apart) the delimited list of values. It's not an elegant
> solution, but it seems like it should work and alleviate the problems I
> have been experiencing with timeouts, locking, and cascading blocking.
>
> Brian
>

Here's one reason: If at some stage in the future you need to do something
with these values, and the number of values changes over time (more
questions get added), you will spend huge amounts of time producing
server-side code to iterate through variable length delimited lists to get
to the value that you want - if it's there.

Here's another: If the requirement changes such that a stakeholder in the
project wants to know how many people answered "2" to question 3, how will
you go about getting a count easily?


--
Mike Brind

Re: Speeding up multiple updates

am 10.10.2006 21:56:00 von brian.lukoff

OK, I agree that what I'm proposing is a kludgy solution.

I recently found this (suggesting using the CASE statement to make a
single UPDATE clause)...it sounds like it might help to speed up the
UPDATEs while maintaining the proper database structure:
http://www.sql-server-performance.com/nb_case_statement.asp

Do you think this will work to improve the performance of the script to
the point of being usable (and remove the problems with timeouts,
locking, cascading blocking)? My understanding is that 30 UPDATE
statements will require 30 passes through the table (which takes
forever with hundreds of thousands or millions of records) while using
the CASE statement will only require 1 pass through the table.

Brian


Bob Barrows [MVP] wrote:
> brian wrote:
> > Bob Barrows [MVP] wrote:
> >> Absolutely not. It's not the worst thing you could do, but it's
> >> close. Almost worse than creating multiple columns to hold the
> >> individual data values.
> >
> >> My solution involves a single UPDATE statement as well. And data
> >> integrity is not compromised.
> >
> > Why is this so bad?
>
> I'm not going to attempt to answer this here because:
> 1. A newsgroup is not the proper forum to teach relational database
> design.
> 2. I am not an expert - I just know enough to realize what a bad idea it
> is to put delimited data values in a database table.
>
> I suggest you get a book or do a google search.
>
> It may seem like a good solution for your immediate problem, but I can
> almost guarantee it will lead to problems down the road.
>
> Again, my proposed solution involved a single update to your table and
> avoided the problems inherent in storing delimited data.
>
> --
> Microsoft MVP -- ASP/ASP.NET
> Please reply to the newsgroup. The email account listed in my From
> header is my spam trap, so I don't check it very often. You will get a
> quicker response by posting to the newsgroup.

Re: Speeding up multiple updates

am 11.10.2006 00:25:38 von reb01501

brian wrote:
> OK, I agree that what I'm proposing is a kludgy solution.
>
> I recently found this (suggesting using the CASE statement to make a
> single UPDATE clause)...it sounds like it might help to speed up the
> UPDATEs while maintaining the proper database structure:
> http://www.sql-server-performance.com/nb_case_statement.asp
>
> Do you think this will work to improve the performance of the script
> to the point of being usable (and remove the problems with timeouts,
> locking, cascading blocking)? My understanding is that 30 UPDATE
> statements will require 30 passes through the table (which takes
> forever with hundreds of thousands or millions of records) while using
> the CASE statement will only require 1 pass through the table.
>

That might help ... and it might not. There are too many factors involved
about which we know nothing: indexes, extent of fragmentation, etc. The only
way to answer the question is to try both solutions and see which one.

My issue with this solution is that it cannot be done without dynamic sql. I
would personally prefer a solution that does not involve concatenating a
potentially large sql statement, especially since this will leave you
vulnerable to sql injection, no matter how stringently you validate the user
inputs.
--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"