What is your performance troubleshooting methodology?

From what angle do you usually handle poorly behaving SQL Server? What is the first thing you check? How?

more ▼

asked Oct 07, 2009 at 12:40 PM in Default

avatar image

MladenPrajdic ♦
282 2 4 6

(comments are locked)
10|1200 characters needed characters left

8 answers: sort voted first

I don't know if this counts but the first thing I ask is "What has changed?" and this speaks to the importance of gather good meaningful baseline metrics and being able to quantify what "poorly behaving SQL" really means.

Baseline Metric (for me anyways) means having CPU, Memory, Disk statistics gathered over time as well as being able to automatically run sets of use-case scenarios to help validate that SQL Server has performance issues.

Has performance been getting worse and worse over time gradually... or do you see a spike type of change that would lead you to believe that it is a specific change that is now causing your pain point.

Instincts may first be to optimize queries or dig into code but if nothing in the code has changed from one day to the next then 9 times out of 10 you're not really going to solve the root problem.

more ▼

answered Oct 07, 2009 at 01:25 PM

avatar image

57 2 1

(comments are locked)
10|1200 characters needed characters left

I first ask what is meant by performing slowly. Is it millisecond queries that are now taking seconds or hour long queries that now take days (don't laugh) or just something that takes longer than the implementer requires.

If it's something that takes a long time and you know what it is and it is running in isolation then just look at the query plan.

If it's long running queries then loojk at the disio (first then cpu) of all processes running (from sysprocesses) and see if a single process (or many) are hogging resources. If this identifies cuplprits then again look at query plans.

This has a good chance of identifying the problem and a solution - which is often one of recoding, indexing, preprocessing (maybe on another server), not running it.

more ▼

answered Oct 07, 2009 at 02:27 PM

avatar image


(comments are locked)
10|1200 characters needed characters left

Hopefully I have a baseline as Colin (BenchmarkIT) has pointed out. If I don't, like when I go to a new client, I first check the OS basics:

  • CPU
  • Disk
  • Memory

I want to see what the overall system is doing. If I see CPU is unusually high, I'll check what process it is. I've seen too many cases where the problem on a SQL Server box isn't actually SQL Server.

Once I've determine that it's not definitely something else, I'll start digging into SQL Server. I'll check for long running queries, blocking, the usual. With 2005/2008 I'll check the DMVs for indexing. I'll run server traces looking to see if there's unusual application activity. Case in point: a few years ago we had a cluster dropping and failing over. And it was big iron. A trace revealed what was happening. An application was spawning thousands of connections in less than a second. On a 32 bit OS, memory got depleted and SQL Server crashed. Turns out there was an infinite loop in a module that had gone into production the night before.

more ▼

answered Oct 07, 2009 at 02:28 PM

avatar image

K. Brian Kelley
1k 1 4 4

(comments are locked)
10|1200 characters needed characters left

I don't have to get involved with new, unfamiliar, servers (well, not very often); so I am in the happy position that I only have my own applications to worry about.

We log every Sproc execution to a table - including a list of the parameters and, when the SProc completes successfully (i.e. no syntax / runtime deadlock / etc error], any Error return code, and its execution time.

We thus have a baseline for the execution time of all Sprocs. Furthermore any Sproc that starts executing outside its expected timespan gets a RECOMPILE automatically. If this doesn't bring it back to normal operating speed then it is listed on our alert list (along with loads of other stuff, mostly to do with error-returns and scenarios we have put a Watch on (because they should never occur, or we changed the code on a LIVE system and want to know when it gets executed so we can babysit it for a while).

I realise this is not "standard" but thought I would raise it in case anyone is interested to discuss further.

more ▼

answered Oct 07, 2009 at 03:16 PM

avatar image

Kristen ♦
2.2k 7 11 14

(comments are locked)
10|1200 characters needed characters left

Assuming it's a new machine, I just do the basics, run perfmon to collect counters to see what sort of load is on the system, wait states, and queues. I'll also set up a server side trace to collect all the procedure calls I can. Then, with the data in hand, I'll start aggregating information through queries. Identify what time of day when the most queueing is occurring and then determining which procedures are running during that time. Like Brian, identify the slowest running queries. Also, identify the most frequently called queries. Based on all this data, start making adjustments to take the slowest, or most frequently called, or the most costly to the system, less so.

more ▼

answered Oct 07, 2009 at 02:43 PM

avatar image

Grant Fritchey 1

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Oct 07, 2009 at 12:40 PM

Seen: 2255 times

Last Updated: Nov 11, 2009 at 05:56 AM

Copyright 2018 Redgate Software. Privacy Policy