Cost-Benefit Analysis for SQL Server Storage upgrade
I'm currently working in an environment that has issues with its storage provision, but lacks the management desire to actually upgrade / replace the storage infrastructure. Part of the problem is that there generally isn't a perceived performance issue with the storage (but I know it should be quicker), it's that the storage provision itself is becoming increasingly fragmented and thinly provisioned. Can anyone point me in the direction of a cost-benefit analysis for this sort of thing? Or just give me some potential benefits (besides freeing up an amount of time from the DBAs & storage guys for trouble-shooting) and other considerations so I can write my own? I know we should be looking at upgrading to SQL2008, but for various reasons that's not an option right now.
I'm going to be blunt here - at first glance, it sounds like there's not really a business problem. You have to define business problems that need to be solved with spending money on a new SAN. New SANs are staggeringly expensive, and when you walk into your boss's office, he's going to be looking at that SAN in relation to your salary. Here's the problems you mentioned: **"There generally isn't a perceived performance issue with the storage (but I know it should be quicker)"** - Why should it be quicker? How quick is quick enough? You can always buy faster storage, so why is the new one going to be better? Are you trying to meet some imaginary metrics set by best practices? The real right answer is, "When the business users are satisfied with the speed, then it's fast enough." **"the storage provision itself is becoming increasingly fragmented"** - How are you measuring that? What's the impact to you? Today's SANs are constantly fragmented with the way their provisioning, snapshots, and expansion work. **"and thinly provisioned."** - If you're in danger of running out of space in the next X months, that's a business metric to take to your management. Show them the percent that you're using today, show your rate of growth, and show when you project to run out of space. You can get these numbers from your backup software by trending the amount of backed up data averaged by day. The guys holding the purse strings get requests for money every day. Would you like a raise? Would you like a better desktop computer? Would the vendors like to get paid faster? Would management like to acquire a competitor? There's all kinds of things they need money for, and you need to start explaining real business problems that the additional SAN space would solve. Otherwise, this question sounds like, "Waaaah, I want a new toy!" (I'd rather you hear this from me than your manager.)
You need to sort out what the problem is and then fix it, if its the storage then so be it. Brent Ozar has some great content on his site for benchmarking systems and showing where the bottlenecks might be. Other than that is a load of questions for you to consider and hopefully the answers will form your submission: What makes you think the storage is the problem? Is it a capacity issue rather than sheer IOPS performance? What is the poor performance costing - honestly - in actual £££'s per day/week/year? What would a change cost and what benefit would you see? This is a tricky one, stating what the benefits will be ahead of time, especially if you cant demonstrate the cost caused by not changing. What are the problems you are currently having to handle?
You say you have storage admins, which is a good thing! They should have some sort of cost breakdown for the SAN storage - £ per GB per year or something similar. These sort of figures should be a good starting point. From the £ per GB, you should be able to get to £ per IOP too, so you can have metrics for raw storage costs and usage costs of the SAN. You also think the storage can go faster, you will need to prove this. SAN XXX from vendor YYY will have published figures for performance, if that is not available, you can work it out through the performance figures for the hard-drives (incl. RAID performance boost or lag) and also the bottlenecks for the SAN fabric (hard drive interface, HBA speed etc.). You then run SLQIO on your LUNS with the behaviour you expect to have on the different LUNs (Log file sequential writes/reads or Data file random). This will quickly (inside 30 minutes) give you some clear idea of your storage capabilities. It may be difficult getting reliable readings if the SAN guys have done thin-provisioning on your SQL LUNs (they love doing this awful, awful thing!). Be careful when hitting the SAN with SQLIO though, it can knock some of them for six - they will have a good burst at the start because they will most likely use the cache. If you let SQLIO run long enough to overrun cache, you can see your SAN drop off a cliff performance wise. Take a look at the Brent Ozar article on [SAN Best Practices], it is brilliant (especially the video tutorial for SQLIO). Getting any changes made in a SAN environment will be difficult, I had to kick and scream at my last job to get any sort of head-way. Good luck to you! :
Based on what you all advised, but taking particular heed of @Brent Ozar's advice, I slightly adjusted the focus of my monitoring and reporting, and wrapped everything up in a handy report for the manager's manager's manager a month later. The report included: * information on the capacity of the storage environment, * the rates of growth of the databases * information from the `fn_virtualfilestats` function traced over time as well as more general information about the server infrastructure (numbers of SQL2000 & 2005 servers, for example) and lists of databases that appeared to be no longer required. I presented this as the final product of my contract, and made recommendations for the storage environment to be beefed up, as well as various consolidation / migration options. Boss^3 seemed to take it all on board, and didn't react badly. I left, and went off to my next contract. I had a phone call about four months later asking if I would be able to go back and do some work "now that we've got the new budget in place and the storage issues largely addressed", which is nice. Unfortunately, I wasn't able to take them up on the offer. Anyway, just thought you might like to know how it all went...
Given that I know your expertise, you've probably thought of this, but remember to make sure storage is really your bottleneck before you spend money for faster storage. Faster drives won't help much if the problem is that your network is too clogged up or that other VMs/applications running on the server are eating all the RAM. And, to paraphrase Jeff Moden, when you have performance issues, check the code first.