question

Joe_Hell avatar image
Joe_Hell asked

maxdop, hyperthreading, and Sandy Bridge

Hello all, SQL Server 2008 R2 EE with SP1 running on Windows server 2008 R2 Enterprise 128 GB of RAM Processor Xeon(R) CPU E5-2640 which is Sandy Bridge MicroArchitecture 2 procs with 6 core hyperthread with 1 NUMA node Looking at historical waits (Using Glenn Berry's query thank you) wait_type wait_time_s pct running_pct CXPACKET 6457806.58 45.19 45.19 OLEDB 3669894.55 25.68 70.88 LATCH_EX 2950025.39 20.65 91.52 Indexes rebuilt or re-orged weekly based on fragmentation. Missing indexes are currently being reviewed and tested. Here is the question. Using this KB article as a guideline http://support.microsoft.com/kb/2023536 item d would suggest setting MAXDOP to 2. Does that sound resonable for a 75 % oltp 25 % reporting production env? Thanks in advance.
maxdop
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Robert L Davis avatar image Robert L Davis commented ·
The KB article recommends setting max DOP to 8. The recommendation is per logical CPU, not per physical CPU. This would be your starting point and then tune as needed.
2 Likes 2 ·
Joe_Hell avatar image Joe_Hell commented ·
Thanks for the suggestions. Currently the action plan is to test disabling node interleaving on one of the nodes on the cluster, then making it the active node. This is to make a better use of the hardware and NUMA. Thanks to Jonathan Kehayias (@SQLPoolBoy) the suggetion. Like I said on twitter the 24 core single NUMA node was buggin me. Again tho let me say I appreciate the suggestions above and you taking the time to respond.
0 Likes 0 ·
Cyborg avatar image
Cyborg answered
Its very difficult to say whether to go or not with the HT for OLTP, you have to test your workload in both options and choose the appropriate. Like wise for OLTP environment people recommend setting MAXDOP to 1, I faced scenarios where setting MAXDOP to higher values helped certain query performance (May be because of the db architecture). I recommend not to set this option at the server lever (using sp_configure), sensible approach will be to find the queries with CXPACKET wait type and fine tune them with smaller MAXDOP option(as query hint), the other side of this approach is when you use query hints you restrict SQL Server to use a better execution plans when your data distribution changes.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered
Determining MAXDOP is a lot of work, but even before doing that, I would examine the threshold for parallelism and set that to a much higher number. The default value of 5 is grossly low for most systems, OLTP or Warehouse. I'd suggest bumping it up to a higher number like 35, then reassess how things are behaving for your parallelism behavior. I would absolutely do this before I started adjusting MAXDOP.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

sql_handle avatar image
sql_handle answered
Joe, don't be too afraid of one large CPU group (and buffer pool/memory node) for SQL Server. There are fairly common workloads for a single CPU group and memory node is more optimal than multiple groups. When SQL Server has a single large CPU group, you want to evaluate memory allocation contention that can arise from allocations serialized at the "NUMA node" or CPU group level. CMEMTHREAD waits and spinlock contention would indicate this bottleneck. That contention can be relieved with trace flag 8048. Here's Rohit Nayak's post on debugging CMEMTHREAD waits if your system ever sees a gathering of them. http://blogs.msdn.com/b/psssql/archive/2012/12/20/how-it-works-cmemthread-and-debugging-them.aspx Don't assume that the attempts of SQL Server to provide lower memory latency will deliver higher performance for your environment. Maybe. But there's a good chance not. Its very easy to end up with an imbalance of work in a NUMA configuration. SQLCAT saw this back in 2009. They implemented a workaround that achieved better task distribution on the NUMA system by terminating and reconnecting when they connected to a busy scheduler while other schedulers were idle. By distributing CPU work more equitably, they achieved 25% higher bulk load throughput on that system. That's clever. But an interleaved memory server already has most of that benefit baked in :) http://sqlcat.com/sqlcat/b/technicalnotes/archive/2008/04/09/resolving-scheduler-contention-for-concurrent-bulk-insert.aspx If there's a CPU imbalance in a NUMA configuration, I'll lay odds there's also a memory utilization imbalance. Why? Because bpool inserts of database blocks in a NUMA configuration happen in the bpool node local to the task. In general, the NUMA node that is busiest for CPU will also have the busiest bpool. The comparison of PLE among busy and idling nodes can be pretty striking - tens of seconds for a busy NUMA node against 4000 or more seconds for a nearly idle NUMA node on the same system (just saw this today while SSIS packages were running). A computed value like PLE can't build a rock solid diagnosis by itself, but together with CPU busy per Core, the bpool node database page count, or bpool node free list count it can be compelling. Similar to storage resources, massive imbalance across similarly provisioned memory resources very likely indicates a performance improvement opportunity. So I'd trend the amount of disk IO for a heavy known workflow, as well as waits and spins. Capture the perCPU CPU busy, and the single memory node perfmon stats. Compare that to the results after going to a NUMA configuration. In some cases, IO will markedly increase, as well as CPU utilization as SQL Server tries to manage a small rapidly moving bpool. There might be inflation of query memory grant requests during bpool rampup, due to away lists (away and taken away will show up in dbcc memorystatus while SQL Server bpool is in growth phase). Those factors may cancel out any benefit of SQL Server's attempt to give local memory access. Linchi Shea's testing indicates that the benefit delivered by local memory access depends greatly on the overall context. http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-access.aspx Or maybe the system will benefit all around from a NUMA configuration. But I see a lot of people assume that to be the case without investigating.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.