I don't know if this counts but the first thing I ask is "What has changed?" and this speaks to the importance of gather good meaningful baseline metrics and being able to quantify what "poorly behaving SQL" really means.
Baseline Metric (for me anyways) means having CPU, Memory, Disk statistics gathered over time as well as being able to automatically run sets of use-case scenarios to help validate that SQL Server has performance issues.
Has performance been getting worse and worse over time gradually... or do you see a spike type of change that would lead you to believe that it is a specific change that is now causing your pain point.
Instincts may first be to optimize queries or dig into code but if nothing in the code has changed from one day to the next then 9 times out of 10 you're not really going to solve the root problem.
answered Oct 07 '09 at 01:25 PM
I first ask what is meant by performing slowly. Is it millisecond queries that are now taking seconds or hour long queries that now take days (don't laugh) or just something that takes longer than the implementer requires.
If it's something that takes a long time and you know what it is and it is running in isolation then just look at the query plan.
If it's long running queries then loojk at the disio (first then cpu) of all processes running (from sysprocesses) and see if a single process (or many) are hogging resources. If this identifies cuplprits then again look at query plans.
This has a good chance of identifying the problem and a solution - which is often one of recoding, indexing, preprocessing (maybe on another server), not running it.
answered Oct 07 '09 at 02:27 PM
Hopefully I have a baseline as Colin (BenchmarkIT) has pointed out. If I don't, like when I go to a new client, I first check the OS basics:
I want to see what the overall system is doing. If I see CPU is unusually high, I'll check what process it is. I've seen too many cases where the problem on a SQL Server box isn't actually SQL Server.
Once I've determine that it's not definitely something else, I'll start digging into SQL Server. I'll check for long running queries, blocking, the usual. With 2005/2008 I'll check the DMVs for indexing. I'll run server traces looking to see if there's unusual application activity. Case in point: a few years ago we had a cluster dropping and failing over. And it was big iron. A trace revealed what was happening. An application was spawning thousands of connections in less than a second. On a 32 bit OS, memory got depleted and SQL Server crashed. Turns out there was an infinite loop in a module that had gone into production the night before.
answered Oct 07 '09 at 02:28 PM
K. Brian Kelley
I don't have to get involved with new, unfamiliar, servers (well, not very often); so I am in the happy position that I only have my own applications to worry about.
We log every Sproc execution to a table - including a list of the parameters and, when the SProc completes successfully (i.e. no syntax / runtime deadlock / etc error], any Error return code, and its execution time.
We thus have a baseline for the execution time of all Sprocs. Furthermore any Sproc that starts executing outside its expected timespan gets a RECOMPILE automatically. If this doesn't bring it back to normal operating speed then it is listed on our alert list (along with loads of other stuff, mostly to do with error-returns and scenarios we have put a Watch on (because they should never occur, or we changed the code on a LIVE system and want to know when it gets executed so we can babysit it for a while).
I realise this is not "standard" but thought I would raise it in case anyone is interested to discuss further.
answered Oct 07 '09 at 03:16 PM