We have been tinkering with a virtual to physical project for several months. The initial driver for the project was to extract the greatest performance possible from our infrastructure to coincide with a re-release of our bespoke Crm application. The existing Db is a virtualised (Esx 5.5) Ws2008 x64/Sql2008 Enterprise. We have built several versions now of a physical server both clustered and not clustered on the same hardware platform as the Esx host (Cisco Ucs B200 M2) with storage from the same Netapp Sas aggregate as the Vm, but have never been able get the same performance from the physical environment. The Vm was P2Veed about 5 years ago. At present the physical box is running 2012R2 datacenter/Sql 2014 Enterprise, but have seen exactly same results with 2008R2/Sql 2008R2 and Sql 2012. Storage i/o tests show us no real differences between the 2 servers getting to the Netapp. For performance measurement we have been using 3 queries (based on typical user interactions with the app) that we have running within a monitoring system. For all 3 queries the physical box performs much the poorer. Typically 50% or more in terms of time elapsed for the queries. One curiosity that has had me even more baffled during the last few months is a select * from a table that has ~1m rows in it. If I run the query from Ssms on the Vm it returns the rows in say 8secs. If I do the same from Ssms on the physical box it returns the rows in about 14secs. But if I connect to the physical box via Ssms on the Vm and run the query it too comes back in 8secs! How is it so? Really scratching my head particularly with the last point! Anyone ever seen anything similar? Cheers Damien.
@Grant and @Fatherjack have both provided some good information. Good stuff was also listed in the comments. But something that I have seen time and again cause this type of conundrum has yet to be mentioned. Check the power plan settings of the physical box. Also check the bios settings to ensure the "green" settings have all been disabled. I have seen the balanced power plan (including the power saving/green settings in bios) cause up to 67% performance degradation in many queries. You should have the power plan set to High performance. You should also ensure the bios is not set to enable the "green" / energy saving features. Here is an [article] that discusses that. :
Instead of trying to run queries from SSMS and get a simple measure, automate the testing. Run the scripts from some third party product or script, PowerShell for example. Call them 10, 50, 1000 times, but do it all the exact same way from this script or location against each of the two servers. While doing this, use Extended Events to capture the query metrics (rpc_complete and sql_batch_complete events) from each server. Compare the results collected by Extended Events. You want everything to be apples to apples on your test runs. In addition to capturing the query metrics, I'd suggest capturing the wait statistics before and after you run the tests. You can use sys.dm_os_wait_stats to get the waits before and after you run the tests. That will tell you what is slowing down the access on each system which will help to determine where the bottlenecks are.