I am running a stored procedure that has a lot of CTEs. The stored procedure is using several tables that has over 100 million records. I’m doing a lot of aggregation inside the CTEs. I was able to run the stored procedure in 3 hours but lately it’s taking more than 15 hours. I am seeing a lot of TempDB disk contention. The initial TempDB size was set to 150GB but now it grew to 600 GB but there are 500 GB free space. I have two RAID 5 drives and one of them has TempDB data file and the other has the Database data files. I looked at sp_who2 and seeing 35 Suspended status with the same SPID. I currently have only one TempDB data file and I have 16 cores. I read somewhere that I would get better performance if I split the TempDB data files into 8 or more equally sized. But my system admin told me it won’t help since I will be putting all the files into same drive. Will I get a better performance if I split the data files in the same drive? Also, should I restart the sql server to shrink the tempdb data file?
Don't shrink tempdb. It grew to that size to support your query. Shrinking it is just going to cause it to have to grow again. Further, how did it grow? In 1mb increments? You might be looking at severe fragmentation (that would cause me to shrink it, but only so I could grow it back in larger chunks). The fact that you are seeing lots of suspended processes, especially with a whole bunch having the same session, suggests that you are bound on CPU and possibly experiencing parallelism. Now, that could be caused by IO issues too, but from everything you're saying, I'd focus on traditional tuning mechanisms. Get the execution plan. See how the query is being resolved. Can you modify the code to increase performance? Do you need more or different indexes? Just traditional stuff.
Splitting tempdb data files into multiple data files helps with latch contention. This comes into play with PFS, GAM and SGAM pages. I have typically had to deal with this with very busy systems with smaller TEMPDB databases, not with one that has grown to 600 GB. Following best practices for TEMPDB would certainly be a good point and your admin telling you it wouldn't help due to the same disk for an IO perspective has a point but the recommendation is more in regards to the internals of SQL Server and how it uses TEMPDB. Shrinking TEMPDB by restarting the services wouldn't help TEMPDB perform better if it simply had to grow the file that large again. What type of contention are you seeing on tempdb? You might also want to consider checking to see if Instant File Initialization is on to prevent the long waits for the file to grow.
> The initial TempDB size was set to > 150GB but now it grew to 600 GB I think that a lot of people are missing the fundamental problem here. The query that causes such an expansion of TempDB and takes 15 hours to run sounds like one huge monolithic catch-all query and I'll just bet that at least one instance of the DISTINCT keyword was used. Rather than messing around with how to optimize TempDB on this one, which might buy you 0 to 60 minutes of time savings on this large monolithic query, you need to spend some time to understand the underlying data and optimizing the query. You need to quickly isolate subsets of information into Temp Tables and join on the Temp Tables instead of trying to do it all in one pass using DISTINCT to overcome the real problem of accidental many-to-many joins and other server crippling problems. If you're trying to optimize TempDB on this one, then you're working on the wrong problem.