question

Gustavo avatar image
Gustavo asked

How to avoid deadlocks and "ghost reads" on a very-high concurrent process ?

I am experiencing some performance issues with locks/deadlocks in a very simple statement executed by hundreds of executions in parallel each second...

Basically we have two tables: counter( id, category_id, name, date, counter ) and category( category_id, name )

And each click on our web-site (using asp, asp.net and IIS), should/must be logged into the DB. Some home-pages can log several counters at once, usually at different categories. Counters can be daily, weekly, monthly or whatever.

We have 1 database server: 64bit, 4 QuadCore processors, 64Gb ram, running sql server 2005 on windows 2003, and 16 IIS version 6.0 (virtual) Servers: 4Gb each.

Except for this bottleneck, the system runs pretty well, under 75% of the hardware capacity ( this include disk, net, cpu and memory )

Todays implementation can be simplified to something like:

create table category(
    category_id int not null primary key identity(1,1)
    , name nvarchar(255) not null
)
go

create table [counter] ( 
    id int not null primary key identity(1,1)
    , category_id int not null references category(category_id) 
    , name nvarchar(255) not null
    , [date] date null
    , [counter] bigint not null default 1
)   
go

CREATE PROCEDURE pr_update_counter ( @category_name nvarchar(255), @counter_name nvarchar(255), @date date = null, @inc int)
    AS BEGIN
      DECLARE @category_id int
      SET XACT_ABORT ON
      SET NOCOUNT ON
      SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

      BEGIN TRAN
          SELECT @category_id = category_id FROM category WHERE name = @category_name
          IF @@ROWCOUNT = 0
          BEGIN
             INSERT INTO category(name) values ( @category_name )
             SELECT @category_id = scope_Identity()
          END

          UPDATE [counter] SET counter = counter + @inc 
          WHERE category_id = @category_id AND name = @counter_name AND (@date is null or [date] = @date)

          IF @@ROWCOUNT = 0
             INSERT counter(category_id, name, [date]) values ( @category_id, @counter_name, @date )
      COMMIT

      SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    END 
GO

Execution examples:

exec pr_update_counter @category_name = 'main', @counter_name = 'home', @date = null, @inc = 1

Some extra information:

  • High execution concurrency: 500+ executions/second ( 5K exec/sec peak )

  • under lower isolation level, we missed some "counts", meaning we had "ghost reads" and probably "ghost updates".

  • under repeatable read or serializable isolation levels some resources lock and sometimes deadlock themselfs.

  • Locks/Deadlocks and timeouts make the web app slower then it should be, and we also miss some "counts" ( those that timedout or was deadlocked ).

How to reproduce the error:

  • Execute hundreds/thousands of times the procedure, preferably at the same exact time and check if "N" executions increased the counter by exact "N".

  • Ex: Create a RUN.BAT file with the following code ( change the execution time to a near future to avoid locking connctions on the server for some time ):

    FOR /L %%i in (1,1,500) do start /b osql -S -U -P -q"waitfor time 'hh:mm:ss'; exec pr_update_counter @category_name = 'main' , @counter_name = 'home' , @date = null , @inc = 1 " < EXIT.TXT EXIT

  • Further create a EXIT.TXT file with a single line of the code "EXIT" ( this was necessarry to close each of the 'cmd' started individually ).

Questions:

  • How to avoid deadlocks and "ghost reads" without slowing down the application neither losing "counts" ?

  • Is there a way to enhance the code, making it faster and more reliable ?

  • Does Snapshot Isolation, Table Split/Partition or Service Broker could help me solve this issue ?

  • Does IIS or any asp/asp.net DLL can perform this counting better than SQL Server ?

  • Does Try Catch can capture and "correct" deadlocks, or using Try Catch can help me in any way ?

  • The system have indexes optimized for this query, but what could be the ideal index to use in this scenario ?

  • We cant use Google Analytics because some client's firewall could be blocking it.

Thanks.


Edit:

Please comment this changes to the code. I priorized the update, since it happens a lot more than inserts, included TRY/CATCH to avoid losing "counts". But i still have the problem that the application must wait too long ( sometimes ) for this execution to finish.

CREATE Procedure [dbo].pr_update_counter2  
(@category_name nvarchar(255), @counter_name nvarchar(255), @date date = null, @inc int)  
As  
SET NOCOUNT ON; SET XACT_ABORT ON;  

WHILE ( @@TRANCOUNT > 0 ) BEGIN
    WAITFOR DELAY '00:00:01' COMMIT
END

SET DEADLOCK_PRIORITY LOW -- <-- not so sure about this
DECLARE @counter_id INT, @category_id INT

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRY
    BEGIN TRAN T1
        UPDATE  [counter] set [counter] = [counter] + @Inc
                            , @counter_id = [counter].id
                            , @category_id = [category].category_id
        FROM    [counter]
                LEFT JOIN category
                on  [counter].category_id = category.category_id
                    AND category.name = @category_name 
        WHERE   isnull([counter].date,'19000101') = isnull(@date,'19000101')
                AND [counter].name = @counter_name 

    COMMIT TRAN T1
END TRY
BEGIN CATCH 
    ROLLBACK TRAN T1
    EXEC dbo.pr_update_counter2 @category_name, @counter_name, @date, @inc
    RETURN
END CATCH

IF ( @category_id IS NOT NULL AND @counter_id IS NOT NULL)
    RETURN

IF ( @category_id IS NULL AND @counter_id IS NULL)
BEGIN
    BEGIN TRY 
        BEGIN TRAN T2
            INSERT INTO [category](name)   
            VALUES (@category_name);

            SET @category_id = SCOPE_IDENTITY();  

            INSERT INTO [counter] (category_id, name, [date], [counter])   
            VALUES (@category_id, @counter_name, @date, @inc);
        COMMIT TRAN T2
    END TRY
    BEGIN CATCH
        ROLLBACK TRAN T2 
        EXEC pr_update_counter2 @category_name, @counter_name, @date, @inc
    END CATCH

    RETURN  
END

IF ( @counter_id IS NULL AND @category_id IS NOT NULL )
BEGIN 
    BEGIN TRY 
        BEGIN TRAN T3
            INSERT INTO [counter] (category_id, name, [date], [counter])   
            VALUES (@category_id, @counter_name, @date, @inc);
        COMMIT TRAN T3
    END TRY 
    BEGIN CATCH
        ROLLBACK TRAN T3 
        EXEC pr_update_counter2 @category_name, @counter_name, @date, @inc
    END CATCH
    RETURN
END

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

RETURN;
t-sqlperformanceperformance-tuningdeadlockparallelism
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Where exactly within the code are you hitting deadlocks? Which statements are causing the deadlock?
0 Likes 0 ·
Gustavo avatar image Gustavo commented ·
It seems the deadlocks happen on the first statement after the serializable begin transaction. Process 1 will eventually lock both resources, as soon as process 2..500 arrives, and the resource is free, it seems that more than one of the 499 processes touch the resources even on a serializable environment.
0 Likes 0 ·
Gustavo avatar image Gustavo commented ·
So far i have 5 approachs to solve this issue: 1) try to use Google Analytics ( with a proxy inside our server to bypass firewall issues), 2) break the tables into N physical tables or even N particions and have a view counting all, 3) change the application, caching/grouping the requests reducing trafic, 4) Unsync the requests and let something stack and deal with the requests in the proper time. 5) fine-tune the sql code.
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered

If anything, snapshot isolation might make this have more problems, not fewer, although you're unlikely to hit as many deadlocks.

Absolutely you should implement TRY/CATCH for the deadlocks. You can resubmit the transaction once and then raise an error if it fails the second time.

The first query is likely to hammer you, but I'd say you should have an index on the Name column of the first table. The second table should have a covering index on categoryid, name and date.

The only bit of functionality in the code that I question is the OR clause to eliminate the NULL values for @Date. Maybe suppying a dummy value when @Date is null would work better here.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

+1 for the question and +1 for grant - but have you considered using a slightly different method for logging? It's pretty unlikely that you are going to be logging many separate category and counter names per second. So maybe you've got 50 counters, but 500 updates per second. You could quite easily create an application layer object into which count updates were queued, and this queue could then be compacted when you want to write to the database.

So you could have a singleton class in your website code, which had a synchronised queue. I.e. when you write to the log, you call into the singleton, and it adds the request to the queue. The queue could easily be modelled as a dictionary object such that if a counter / category / date combination already exists in the queue, then you increment the count for that item rather than creating another item.

You would then have a worker thread which read items from the queue, and wrote them to the database, incrementing the count by the relevant number in the database.

This is effectively an implementation of the ProducerConsumer pattern, with a bit of an extension around the way that items are produced (i.e. they can modify existing items).

This way, you would reduce the workload on the database by somewhere in the region of 90%, and the increase in workload on the web server would be negligible...

Edit -> Just noticed you have 16 IIS servers - so you would get less of a reduction in executions - but you could consider throttling the background thread to compensate (for example, log items once every 5 seconds)...

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered

Definitely agree with Grant's comments.

I had a similar situation once with high-concurrency on a session data table, and after much trial and error with isolation levels, try/catch blocks etc. the deadlocks were alleviated by a better indexing strategy, i.e. a covering index

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jeff Moden avatar image
Jeff Moden answered

I had a similar situation with such a "sequence" table at my old job. The "GetNextID" proc similar to both of yours was causing an average of 640 deadlocks per day with daily spikes to more than 4,000. The day I made the change I made to the code, the deadlocks plummeted to nearly zero and, despite the horrible code in the rest of the system, a couple of additional tweaks to the code that used it allowed us to go for several days before another deadlock appeared.

The question is, what are your sprocs supposed to return if the increment is > 1? These types of procs normally return the count stored in the table as the NextID... just because you have an increment shouldn't change that fact.

I realize this is an old post but if you're still having problems, I know how to fix both versions you currently have. Post back and let me know. It may take me a couple days to respond but I will respond... with code.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.