How to measure the performance improvement that NUMA brings?
I am testing how one of our stored procedure is working on `SQL Server Standard` vs `SQL Server Enterprise` editions. I have created two virtual machines and install a `SQL Server`(`SQL Server 2014`) and restore a database on each instance. Then I have generated some test data (large volume but the same on each database) and started testing if there is an execution time difference. I have been told that there should be `4x` better performance on the `Enterprise` edition when a large volume of data is used because there is [NUMA Aware Large Page Memory and Buffer Array Allocation]. So far, there is no such difference and the execution time is almost the same (a `-` or `+` second difference). I cannot say I understand completely what is `NUMA` and how it works, but I guess the hardware `NUMA` is on as the following query returns me `0` and `64`: SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks I am not interested in the software `Numa`. Could anyone tell if such `4x` optimization really exists and should I continue generated more data in order to see it? This is the `VM` set up: ![alt text] :
https://msdn.microsoft.com/en-us/library/cc645993.aspx : /storage/temp/2745-1.png
First off, you have to make sure you have your virtual machines set up properly in support of NUMA. I didn't see if you were using [VMWare] or [HyperV], but there are links to both on how to configure it. Next, if you really want to see NUMA run fast, avoid the hypervisors entirely. Unless you're dedicating cpu and memory to your SQL Server box, then because you're virtualized, it's bouncing between shared resources, which is likely to make any advantages of NUMA go away. It goes back to how you have configured your VM. I've never heard that you're going to see a flat 4x improvement from NUMA architecture. To test this, you need to be running queries that are going to hit memory more than anything else. You need to observe, not just execution time, but wait statistics. You want to know where the system is waiting on resources. If you're I/O bound, you're unlikely to see any benefits from the advanced memory architecture. Your tests and your monitoring have to capture that information. Now, I'm not an architecture expert. I'd suggest reading up on Jonathan Kehayias' information over at
SQLSkills.com for a lot more detail. :