I am experiencing some performance issues with locks/deadlocks in a very simple statement executed by hundreds of executions in parallel each second...
Basically we have two tables: counter( id, category_id, name, date, counter ) and category( category_id, name )
And each click on our web-site (using asp, asp.net and IIS), should/must be logged into the DB. Some home-pages can log several counters at once, usually at different categories. Counters can be daily, weekly, monthly or whatever.
We have 1 database server: 64bit, 4 QuadCore processors, 64Gb ram, running sql server 2005 on windows 2003, and 16 IIS version 6.0 (virtual) Servers: 4Gb each.
Except for this bottleneck, the system runs pretty well, under 75% of the hardware capacity ( this include disk, net, cpu and memory )
Todays implementation can be simplified to something like:
create table category( category_id int not null primary key identity(1,1) , name nvarchar(255) not null ) go create table [counter] ( id int not null primary key identity(1,1) , category_id int not null references category(category_id) , name nvarchar(255) not null , [date] date null , [counter] bigint not null default 1 ) go CREATE PROCEDURE pr_update_counter ( @category_name nvarchar(255), @counter_name nvarchar(255), @date date = null, @inc int) AS BEGIN DECLARE @category_id int SET XACT_ABORT ON SET NOCOUNT ON SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRAN SELECT @category_id = category_id FROM category WHERE name = @category_name IF @@ROWCOUNT = 0 BEGIN INSERT INTO category(name) values ( @category_name ) SELECT @category_id = scope_Identity() END UPDATE [counter] SET counter = counter + @inc WHERE category_id = @category_id AND name = @counter_name AND (@date is null or [date] = @date) IF @@ROWCOUNT = 0 INSERT counter(category_id, name, [date]) values ( @category_id, @counter_name, @date ) COMMIT SET TRANSACTION ISOLATION LEVEL READ COMMITTED END GO
exec pr_update_counter @category_name = 'main', @counter_name = 'home', @date = null, @inc = 1
Some extra information:
High execution concurrency: 500+ executions/second ( 5K exec/sec peak )
under lower isolation level, we missed some "counts", meaning we had "ghost reads" and probably "ghost updates".
under repeatable read or serializable isolation levels some resources lock and sometimes deadlock themselfs.
Locks/Deadlocks and timeouts make the web app slower then it should be, and we also miss some "counts" ( those that timedout or was deadlocked ).
How to reproduce the error:
Execute hundreds/thousands of times the procedure, preferably at the same exact time and check if "N" executions increased the counter by exact "N".
Ex: Create a RUN.BAT file with the following code ( change the execution time to a near future to avoid locking connctions on the server for some time ):
FOR /L %%i in (1,1,500) do start /b osql -S -U -P -q"waitfor time 'hh:mm:ss'; exec pr_update_counter @category_name = 'main' , @counter_name = 'home' , @date = null , @inc = 1 " < EXIT.TXT EXIT
Further create a EXIT.TXT file with a single line of the code "EXIT" ( this was necessarry to close each of the 'cmd' started individually ).
How to avoid deadlocks and "ghost reads" without slowing down the application neither losing "counts" ?
Is there a way to enhance the code, making it faster and more reliable ?
Does Snapshot Isolation, Table Split/Partition or Service Broker could help me solve this issue ?
Does IIS or any asp/asp.net DLL can perform this counting better than SQL Server ?
Does Try Catch can capture and "correct" deadlocks, or using Try Catch can help me in any way ?
The system have indexes optimized for this query, but what could be the ideal index to use in this scenario ?
We cant use Google Analytics because some client's firewall could be blocking it.
Please comment this changes to the code. I priorized the update, since it happens a lot more than inserts, included TRY/CATCH to avoid losing "counts". But i still have the problem that the application must wait too long ( sometimes ) for this execution to finish.
CREATE Procedure [dbo].pr_update_counter2 (@category_name nvarchar(255), @counter_name nvarchar(255), @date date = null, @inc int) As SET NOCOUNT ON; SET XACT_ABORT ON; WHILE ( @@TRANCOUNT > 0 ) BEGIN WAITFOR DELAY '00:00:01' COMMIT END SET DEADLOCK_PRIORITY LOW -- <-- not so sure about this DECLARE @counter_id INT, @category_id INT SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRY BEGIN TRAN T1 UPDATE [counter] set [counter] = [counter] + @Inc , @counter_id = [counter].id , @category_id = [category].category_id FROM [counter] LEFT JOIN category on [counter].category_id = category.category_id AND category.name = @category_name WHERE isnull([counter].date,'19000101') = isnull(@date,'19000101') AND [counter].name = @counter_name COMMIT TRAN T1 END TRY BEGIN CATCH ROLLBACK TRAN T1 EXEC dbo.pr_update_counter2 @category_name, @counter_name, @date, @inc RETURN END CATCH IF ( @category_id IS NOT NULL AND @counter_id IS NOT NULL) RETURN IF ( @category_id IS NULL AND @counter_id IS NULL) BEGIN BEGIN TRY BEGIN TRAN T2 INSERT INTO [category](name) VALUES (@category_name); SET @category_id = SCOPE_IDENTITY(); INSERT INTO [counter] (category_id, name, [date], [counter]) VALUES (@category_id, @counter_name, @date, @inc); COMMIT TRAN T2 END TRY BEGIN CATCH ROLLBACK TRAN T2 EXEC pr_update_counter2 @category_name, @counter_name, @date, @inc END CATCH RETURN END IF ( @counter_id IS NULL AND @category_id IS NOT NULL ) BEGIN BEGIN TRY BEGIN TRAN T3 INSERT INTO [counter] (category_id, name, [date], [counter]) VALUES (@category_id, @counter_name, @date, @inc); COMMIT TRAN T3 END TRY BEGIN CATCH ROLLBACK TRAN T3 EXEC pr_update_counter2 @category_name, @counter_name, @date, @inc END CATCH RETURN END SET TRANSACTION ISOLATION LEVEL READ COMMITTED; RETURN;
Answer by Grant Fritchey ·
If anything, snapshot isolation might make this have more problems, not fewer, although you're unlikely to hit as many deadlocks.
Absolutely you should implement TRY/CATCH for the deadlocks. You can resubmit the transaction once and then raise an error if it fails the second time.
The first query is likely to hammer you, but I'd say you should have an index on the Name column of the first table. The second table should have a covering index on categoryid, name and date.
The only bit of functionality in the code that I question is the OR clause to eliminate the NULL values for @Date. Maybe suppying a dummy value when @Date is null would work better here.
Answer by Matt Whitfield ·
+1 for the question and +1 for grant - but have you considered using a slightly different method for logging? It's pretty unlikely that you are going to be logging many separate category and counter names per second. So maybe you've got 50 counters, but 500 updates per second. You could quite easily create an application layer object into which count updates were queued, and this queue could then be compacted when you want to write to the database.
So you could have a singleton class in your website code, which had a synchronised queue. I.e. when you write to the log, you call into the singleton, and it adds the request to the queue. The queue could easily be modelled as a dictionary object such that if a counter / category / date combination already exists in the queue, then you increment the count for that item rather than creating another item.
You would then have a worker thread which read items from the queue, and wrote them to the database, incrementing the count by the relevant number in the database.
This is effectively an implementation of the ProducerConsumer pattern, with a bit of an extension around the way that items are produced (i.e. they can modify existing items).
This way, you would reduce the workload on the database by somewhere in the region of 90%, and the increase in workload on the web server would be negligible...
Edit -> Just noticed you have 16 IIS servers - so you would get less of a reduction in executions - but you could consider throttling the background thread to compensate (for example, log items once every 5 seconds)...
Answer by Kev Riley ·
Definitely agree with Grant's comments.
I had a similar situation once with high-concurrency on a session data table, and after much trial and error with isolation levels, try/catch blocks etc. the deadlocks were alleviated by a better indexing strategy, i.e. a covering index
Answer by Jeff Moden ·
I had a similar situation with such a "sequence" table at my old job. The "GetNextID" proc similar to both of yours was causing an average of 640 deadlocks per day with daily spikes to more than 4,000. The day I made the change I made to the code, the deadlocks plummeted to nearly zero and, despite the horrible code in the rest of the system, a couple of additional tweaks to the code that used it allowed us to go for several days before another deadlock appeared.
The question is, what are your sprocs supposed to return if the increment is > 1? These types of procs normally return the count stored in the table as the NextID... just because you have an increment shouldn't change that fact.
I realize this is an old post but if you're still having problems, I know how to fix both versions you currently have. Post back and let me know. It may take me a couple days to respond but I will respond... with code.