question

Joe_Hell avatar image
Joe_Hell asked

New queries assigned to process on Node 3 have not been picked up by a worker thread in the last 60 seconds

SQL Server 2008 R2 EE sp1 in a cluster - 64 logical cpus (4 sockets with 8 cores hyper threaded) - 512 GB of RAM - per Task manager - 250 Free memory - 13888 available memory The error in the subject line occurred once. Followed by ~50 occurrences of the client was unable to reuse session failure 29 over 7 seconds Full message New queries assigned to process on Node 3 have not been picked up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 7%. System Idle: 92%. - no memory dmp. - all ready aware of [ http://support.microsoft.com/kb/2543687][1] to address forced connection closures I am concerned with possible scheduler issues? Any pertinent information appreciated. [1]: http://support.microsoft.com/kb/2543687
sql-server-2008-r2error-message
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Maybe I should say unresponsive schedulers vs deadlocked schedulers since there was no memory dump and no error message saying deadlock schedulers. Would unresponsive scheduler be a correct interpertation of "New queries assigned to process on Node 3 have not been picked up by a worker thread in the last 60 seconds" ?
0 Likes 0 ·

1 Answer

·
SirSQL avatar image
SirSQL answered
What's your current setting for max worker threads? By default I would expect it to be zero which would leave it set dynamically (and given the information you've provided it should give you 1472 threads). Possibly this is insufficient for the number of worker threads that you need. IT might be worth checking on sys.dm_os_Waiting_tasks to see what's holding up existing threads from being released. It could be related to generalized slowness due to parallelism overload. Also, check your general wait stats and server configuration (pay particular attention to your maxdop value)
8 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I'd ignore the FT_IFTSHC_MUTEX waits, those are background and related to the full-text service, I don't believe those are an issue. Your LCK_M_IX (intent exclusive lock wait) time seems quite high. Are you index optimized and seeing blocking anywhere? Or running potentially large update statements?
4 Likes 4 ·
Have you checked the hardware? It may be a NUMA issue. Are the CXPACKET waits top by quantity or duration? Hyperthreading may be overkill on this system but that would take deeper inspection.
4 Likes 4 ·
I agree with SirSQL and Blackhawk-17. This could be the case of some run-away queries (heavy, resource-intensive, many threads ) creating a race condition, and the queue that got created due to that keeps on growing. So one should investigate to find the slow/long running, resource intensive queries. Blackhawk-17 have a very valid point about the quantity and duration for CXPACKET. The top four waits may change accordingly? Also a drastic difference among the waits types numbers could exclude a wait type instantly. So please post the figures for wait type as well. Carrying on with Blackhawk-17 's valuable suggestion, with MAXDOP 8 setting, a complex query with let say 8 multiple parallelism operators would equate to 8*2(HT)*8 = 128 threads. If the same query runs concurrently by 10 users, then 1280 threads would be eaten up by one query. This is kind of an extreme case, but this is where Hyper threading may be becoming the cause. I would tend to try after either disabling the Hyper-threading OR with a smaller MAXDOP setting e.g. 4 (preferably on a test bud with same workload). But again, the root cause would be bad/resource intensive queries. Another missing information is the background? Is it a new server? Have you just upgraded the hardware OR SQL Server? The problem creeped up suddenly OR it happened with time? Are there any recent code changes in the database OR application? What is the version of your OS? In case of a new server, have you checked with the vendor that is supports SQL Server with hyper-threading efficiently?
4 Likes 4 ·
@Joe_Hell I missed the point that you can also try lowering the maxdop for the new code only. This way you do not have to change server/instance wide settings.
3 Likes 3 ·
@Joe_Hell These are general guidelines and one should test different MAXDOP settings (I always prefer a smaller number) according to the workload and environment. Since it happened after pushing new code into production, It should make your life easy to identify the culprit. Wish you good luck for that ;)
2 Likes 2 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.