|
It's a typical scenario, you have a client / database with a performance issue, and there isn't much information available about what might be wrong apart from 'it's going slowly'. Here is what I tend to do:
After that, I would start visiting the DMVs (for example Where would you start, given a problem like that? Do you find that you have a set of 'starter scripts' that give you some data that you can begin to work with? The reason I ask is that I had exactly this scenario a couple of weeks back, and I thought I would ask you guys just to make sure I'm not going about it a completely numpty way!
(comments are locked)
|
|
Your outline is pretty accurate to how I'd do it too. Address the server & the physical architecture right at the start. This is seldom the source of the problem, so you should be able to quickly determine whether or not things are set up right. Fix what's wrong and don't look back. After that... things get interesting. I usually take one of two approaches. Approach #1, gather query metrics. You can set up a server-side trace or you can look at the DMV's like you've already mentioned. Figure out what's called most frequently, what runs the longest, what uses the most cpu, what uses the most i/o, and, which queries are on more than one of these lists. With that in hand, you can begin tuning. Gathering query metrics like this has the added benefit of showing you how the queries are called. For example, we had an app that called every query twice. It was just bad code, but if we weren't looking at a trace, we'd have just thought it was active. I've also seen instances where a lookup table is called hundreds of times from one connection because the developers didn't cache data... All this type of thing is what you can get from the trace. The other approach, and it's pretty popular, is to look at the wait states. What are you waiting on? Where are the queues too long? Identify these locations, the associate those waits to procedures and begin to tune that way. Finally, it's always best to talk to the business users. Where do they feel pain. Maybe you think, based on good data, that a given procedure needs to be tuned because it's called hundreds of times an hour, but the users are more acutely aware of a single report that runs once a day or something. Making them happy is as important as a more rigourous approach. With a system already in production, mature or not, you can't really establish a baseline. The baseline is today. The trick is to not waste your time tuning stuff that's not important, either to the business, or to the overall performance metrics. Epically good answer... Do you mind if I drop you an email with some questions on a semi-related topic?
Apr 30 '10 at 09:12 AM
Matt Whitfield ♦♦
Thanks, it's not like I wrote a book on this or anything... oh wait, I did... Kidding. Of course you can ask. I'll try to help.
Apr 30 '10 at 09:19 AM
Grant Fritchey ♦♦
Brilliant - I'll drop you an email when I get five minutes in between looking after chickens and kittens (don't ask!)
Apr 30 '10 at 10:15 AM
Matt Whitfield ♦♦
+1 to Matt for stating that chicken_importance > Grant_importance > kitten_importance. Can you tell its Friday?
Apr 30 '10 at 11:25 AM
Fatherjack ♦♦
Don't worry, there's no ORDER BY clause when you're married to the female equivalent of dr. Doolittle
Apr 30 '10 at 04:34 PM
Matt Whitfield ♦♦
(comments are locked)
|
|
I would hope (and its a big hope) that there would be some benchmark of what normal is so that we could determine how slow it actually is, it may be that its not far off and therefore not worth investigating at the moment. When you're coming into a system, you almost never have a baseline. I've never seen one anyway. Other than that, you've got as good an approach here as any I've seen. I'd be curious what your scripts look like. I actually don't have any magic ones any more. I just focus on collecting data & then munge that to figure out what to fix.
Apr 30 '10 at 09:21 AM
Grant Fritchey ♦♦
@Grant, Thanks. I figured its not common. We actually have some systems here that we baseline monthly, the UI is timed (only to the nearest s) to do standard user processes. We use this to decide if a "its going slow again" call are justified and also to compare new application versions - we get at least 2/year from our suppliers. Most often historically its been AV, IIS or an index frag issue. Happy to share the scripts ... email ...?
Apr 30 '10 at 09:31 AM
Fatherjack ♦♦
grantedd at gmail dot com. It's public but I don't like posting it unobfuscated.
Apr 30 '10 at 11:43 AM
Grant Fritchey ♦♦
Thanks, got it, feel free to delete now. Only actually meant did you want to share them over email, would have gone to scarydba to get it .. !
May 01 '10 at 05:20 AM
Fatherjack ♦♦
@Fatherjack - I'd be interested in seeing your toolbox scripts too actually! I think you should have my email? I'll drop you a line just in case.
May 01 '10 at 12:23 PM
Matt Whitfield ♦♦
(comments are locked)
|


Matt, just about to send an email and I dont have your address - should I use Atlantis Interactive ?
@Fatherjack - yep, mattw [at] atlantis-interactive.co.uk is me :)