From what angle do you usually handle poorly behaving SQL Server? What is the first thing you check? How?
I don't know if this counts but the first thing I ask is "What has changed?" and this speaks to the importance of gather good meaningful baseline metrics and being able to quantify what "poorly behaving SQL" really means.
Baseline Metric (for me anyways) means having CPU, Memory, Disk statistics gathered over time as well as being able to automatically run sets of use-case scenarios to help validate that SQL Server has performance issues.
Has performance been getting worse and worse over time gradually... or do you see a spike type of change that would lead you to believe that it is a specific change that is now causing your pain point.
Instincts may first be to optimize queries or dig into code but if nothing in the code has changed from one day to the next then 9 times out of 10 you're not really going to solve the root problem.
I first ask what is meant by performing slowly. Is it millisecond queries that are now taking seconds or hour long queries that now take days (don't laugh) or just something that takes longer than the implementer requires.
If it's something that takes a long time and you know what it is and it is running in isolation then just look at the query plan.
If it's long running queries then loojk at the disio (first then cpu) of all processes running (from sysprocesses) and see if a single process (or many) are hogging resources. If this identifies cuplprits then again look at query plans.
This has a good chance of identifying the problem and a solution - which is often one of recoding, indexing, preprocessing (maybe on another server), not running it.
Hopefully I have a baseline as Colin (BenchmarkIT) has pointed out. If I don't, like when I go to a new client, I first check the OS basics:
I want to see what the overall system is doing. If I see CPU is unusually high, I'll check what process it is. I've seen too many cases where the problem on a SQL Server box isn't actually SQL Server.
Once I've determine that it's not definitely something else, I'll start digging into SQL Server. I'll check for long running queries, blocking, the usual. With 2005/2008 I'll check the DMVs for indexing. I'll run server traces looking to see if there's unusual application activity. Case in point: a few years ago we had a cluster dropping and failing over. And it was big iron. A trace revealed what was happening. An application was spawning thousands of connections in less than a second. On a 32 bit OS, memory got depleted and SQL Server crashed. Turns out there was an infinite loop in a module that had gone into production the night before.
Assuming it's a new machine, I just do the basics, run perfmon to collect counters to see what sort of load is on the system, wait states, and queues. I'll also set up a server side trace to collect all the procedure calls I can. Then, with the data in hand, I'll start aggregating information through queries. Identify what time of day when the most queueing is occurring and then determining which procedures are running during that time. Like Brian, identify the slowest running queries. Also, identify the most frequently called queries. Based on all this data, start making adjustments to take the slowest, or most frequently called, or the most costly to the system, less so.
I don't have to get involved with new, unfamiliar, servers (well, not very often); so I am in the happy position that I only have my own applications to worry about.
We log every Sproc execution to a table - including a list of the parameters and, when the SProc completes successfully (i.e. no syntax / runtime deadlock / etc error], any Error return code, and its execution time.
We thus have a baseline for the execution time of all Sprocs. Furthermore any Sproc that starts executing outside its expected timespan gets a RECOMPILE automatically. If this doesn't bring it back to normal operating speed then it is listed on our alert list (along with loads of other stuff, mostly to do with error-returns and scenarios we have put a Watch on (because they should never occur, or we changed the code on a LIVE system and want to know when it gets executed so we can babysit it for a while).
I realise this is not "standard" but thought I would raise it in case anyone is interested to discuss further.
Assuming that no changes have gone into production recently...
I usually start with looking for which execution plan suddenly is wrong.
A quick peek at a query like this usually tells me which plan is gone nuts.
select sys.dm_exec_requests.session_id, sys.dm_exec_cached_plans.*, (select text from sys.dm_exec_sql_text(sys.dm_exec_requests.sql_handle)) from sys.dm_exec_cached_plans join sys.dm_exec_requests on sys.dm_exec_cached_plans.plan_handle = sys.dm_exec_requests.plan_handle
If nothing shows up there, check the buffer cache and make sure that SQL didn't flush for some reason. After that check maintenance jobs and make sure nothing failed in the middle.
After that check the disks and see if they are being hit hard. If that shows funky looking data, I'll check the SAN it self to see what it shows.
Always look at execution plans like Mr Denny recommends.
Rarely is performance "all of the sudden" poor. I find it has been slowing down for quite a while and you have hit the tipping point where people are complaining. If it has suddenly gone sideways I look at reindex jobs that have failed or bad statistics causing poor plans.
On the server side: Memory pressure - has your database out grown the size of the memory? sql io stats - high wait times can be poorly configured disks or related to Memory Pressure. CPU loading - are the cpus constantly loaded up, maybe you have out grown them.
Growing databases can push you over the waterfall making you think this problem is sudden when in fact it has been creeping up on you for a while.
Personally I subscribe to the well vetted and published troubleshooting methodology covered in the Troubleshooting Performance Problems in SQL Server 2005 whitepaper. Basically focus first on Resource Bottlenecks with wait stats analysis and performance counter collections (focusing on memory, cpu and io), then look at Tempdb Bottlenecks, followed by Slow-running Queries looking for blocking, missing indexes/statistics and bad query plans that lead to query rewrites.
So far this has never let me down. I'd say that at least 3/4 of the time I find a resource bottleneck on servers that I don't already know. If I know the server/environment, I always doublecheck that a resource issue doesn't exist first before targeting my searches down to the query level. No point trying to tune a query if the hardware is the ultimate problem. I seem to keep finding SQL Server running on Desktop hardware in small businesses.
No one has followed this question yet.