x

Disaster Recovery Best Practices - What do you do?

We all know we need backups. For those systems that are more critical than others we know to replicate, mirror, or log ship. We also know we need to validate our backups by doing regular restores. That is all well and fine, but what happens when you lose the server. Just being able to restore your user database isn't enough. What do others do about users, jobs, dts, ssis? Do you have regular jobs that script all the user objects out? What is the industry standard? These thoughts were triggered for me this weekend by reading "DBA Survivor" by Thomas LaRock and reading a blog by Sean McCown. Curious to hear others thoughts and how you are achieving having a real recovery plan.

How do you handle SQL 2000, 2005, or 2008 differently than the others if at all.

--Additional info after my original question

Sounds like I asked a good question. So far 3 responses by @CirqueDeSQLeil, @Matt Whitfield, and @Fatherjack. All seem to be doing their own thing for recovery plans. In my current situation I have full backups of everything, and on most of my systems I have mirroring setup. What I don't always have is my users, SQL jobs, DTS, and SSIS packages scripted out. I do have master and msdb backed up so I can always do a restore of those and then export the jobs and script out users but in disaster mode I think it would be a lot better if I had to restore my user db's to another server to just access a scripts folder and grab the scripts for all my jobs, users, etc. With that being said, for those that agree should I use T-SQL, Powershell, or what. For those that don't agree, tell me why and how you would handle recovering a user database at 3 AM when you get a call from the NOC saying the server is up in flames and all you have is a full backup and some t-logs.

I appreciate everyones feedback. I am thinking of incorporating all this into a nice blog and possibly a presentation at my local PASS Chapter, then maybe one day at a SQL Saturday. I am thinking of a title, "So you inherited a Database, now what do you do?"
more ▼

asked Oct 04, 2010 at 11:36 AM in Default

Tim gravatar image

Tim
36.4k 38 41 139

(comments are locked)
10|1200 characters needed characters left

8 answers: sort voted first

DR is a huge deal. Let's see, starting with the easy stuff, we have regular backups that are run, and tested, in an automated fashion. This is so that we know we can restore the backups. Depending on the system, there are also regular log backups for a point in time recovery, if needed. That's the easy part. The fun stuff, our SAN is mirrored to two off-site locations. It's in a building seperate from our corporate offices with a HUGE fire suppression system, cooling system, multiple power connections and two seperate diesel generators in the event of a power loss. Depending on the system we've also got other types of virtual mirroring running as part of the HA system. There are fully documented DR procedures and we do run throughs of all operations, at least once a year, from the documentation.

Other things, we have all database code checked into source control, versioned, labeled, and we can use that to recreate databases if things go really south.

UPDATED: We also backup our system databases and test restores as part of validating security. Testing, testing, testing, documentation, testing, training, documentation, testing, testing, testing... You get the point. But that's how to make sure you know what to do, how to do it, and that everything will work.
more ▼

answered Oct 04, 2010 at 12:11 PM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
101k 19 21 74

Oh, and our backups are backed up and shipped to a completely different off-site location from either of our two DR locations... my company kind of likes its data.
Oct 04, 2010 at 12:14 PM Grant Fritchey ♦♦

Thanks Grant. We have a similar DR type site. Multiple power grids feeding it, redundant generators, redundant data providers, all that awesome networking goodness. The primary reason for my question is what should be be doing outside of full backups, log backups, mirroring, log shipping, etc. It is one thing to be able to recover a user database up to a narrow point in time, but if you have a complete server loss, what about all your SQL jobs, users, SSIS packages, DTS, etc. If you have a DR server where you have been mirroring just the user databases and having already been syncing users, scripting all sql job changes, SSIS changes, etc, you could find yourself having to restore MSDB to another server and trying to recover your packages. What is the best practice and realistic to protect ones self from this type of scenario?

You can thank SQLRockstar for all this. I started reading his book and realized OH CRAP, I have inherited some potential headaches. I might ought to do something about this.
Oct 04, 2010 at 12:23 PM Tim
As far as SSIS goes, we check that into source control too. And that is backed up, moved off site, blah, blah, blah, so if we have to, we can go there to get it. Our SSIS servers are treated a little differently than our SQL Servers (since they are a little different) but by & large, we're still counting on the same thing. A lot more stuff is being done with virtuals now, even in production, and we're seeing that whole snapshot thing becoming a part of the DR strategy.
Oct 04, 2010 at 12:34 PM Grant Fritchey ♦♦
(comments are locked)
10|1200 characters needed characters left
Beyond the nightly backups, I do nightly restores onto a different server. This is to test the nightly backups. In addition, those nightly backups are also stored in a grid that is replicated to a DR site. I should be able to at a minimum access the backups. I am in process of also setting up DR for SSRS. We have jobs that script out user access on a regular basis and SSIS, DTS and RDL files are all stored in TFS.
more ▼

answered Oct 04, 2010 at 11:42 AM

CirqueDeSQLeil gravatar image

CirqueDeSQLeil
4.5k 10 11 15

Thanks. Are those scripts applied to all servers in your environment?
Oct 04, 2010 at 12:20 PM Tim
All prod servers. Then it is pick and choose in dev and QA
Oct 04, 2010 at 12:31 PM CirqueDeSQLeil
(comments are locked)
10|1200 characters needed characters left

Good stuff from everyone here already, I'll just add my take...

As well as ensuring the system databases are regularly backed up and verified (master for security credentials, msdb for jobs, poss DTS packages, model for ...well if you've made changes to model for you environment, you might want to keep them), the biggest most single important piece for me is documentation about the server.

Each server that gets provisioned is documented so that in the event of having to recreate it (whether in an emergency or not), we have all the settings in place:

  • Server Collation
  • location of data files/log files/ tempdb/ system database
  • default security settings
  • etc, etc

all the kinds of things that would give you a complete picture of the server.

A few years ago, I took a team of people (DBAs & infrastructure guys) over to Dell labs in Ireland, armed only with copies of the server docs and recent backups/files/scripts to see if we could get a complete copy of the production server up and running on some test servers. It was certainly one way of testing to see what was and wasn't missing!
more ▼

answered Oct 04, 2010 at 12:23 PM

Kev Riley gravatar image

Kev Riley ♦♦
53k 47 49 76

(comments are locked)
10|1200 characters needed characters left

I think this is a much bigger question - actually. Because backing up SQL Server databases is really very simple, as is the mirroring, log shipping etc. But the services around them is what makes it difficult...

We used to make all our software such that configuration was stored in a predictable directory, and then this configuration was replicated using NT file replication. We used a specific script on a schedule to backup the IIS Metabase from the main site to the DR site (I think we did that once every 4 hours, or we had a process to do it manually after any config change was accepted). However, I'm not sure how the whole active directory / network approach was handled - usually my brain would start to fry shortly after the word 'VLAN' was mentioned!

As for what's industry standard, I have no idea. I just know what I did.
more ▼

answered Oct 04, 2010 at 11:50 AM

Matt Whitfield gravatar image

Matt Whitfield ♦♦
29.4k 61 65 87

mmmmm fried brains!! +1
Oct 04, 2010 at 12:17 PM Fatherjack ♦♦
Agreed, backing up SQL is pretty easy and all the networking, ad, applications, web interfaces, DNS, etc play a huge role in any recovery. My main questioning on here was what to do other than regular backups, log shipping, mirroring. If one had to fail over a SQL database from one server to another and all they had was the user database, they don't have the full picture. What about SQL jobs, users, DTS and SSIS packages, SSRS, etc. They may have the data but users may not be able to access it, jobs won't be there to import data, etc. How are folks handling everything outside of "backup database xyz to disk"
Oct 04, 2010 at 12:17 PM Tim
(comments are locked)
10|1200 characters needed characters left

In our domain we make as much use as possible of windows groups and users for security so that removes some of the headache of rebuilding security - ie no passwords to store/transfer. As you say a script of these is taken so it can be applied in whole or part to a 'new' server. Where a database is mirrored we have already configured security, just to be ready.
Backups are stored on a network share so will be available to any server coming online.
Our SSIS packages are stored in the file system so can be transferred with the database(s) as required.
Jobs get scripted to a directory.

Keep in mind that you would also have the job details in system database backups (ie users, logins, jobs, steps, schedules etc etc) if you were really desperate or had missed something.
more ▼

answered Oct 04, 2010 at 11:51 AM

Fatherjack gravatar image

Fatherjack ♦♦
42.7k 75 79 108

Windows Groups does make things so much easier but I can't always get away with that. To many crappy applications that don't support it. That is a good idea to always have the SSIS packages backed up to file system as well. For non windows users, do you regularly script those users out? Do you do the same for the sql jobs?
Oct 04, 2010 at 12:19 PM Tim
it doesnt happen that often to be honest as we dont have much churn in that detail, permissions are set and that is, essentially, that. Same with Jobs, not much changes after the job is created. Didnt mention it 1st but we also have the SSIS SSMS project files on the file store that we could rebuild jobs from...
Oct 04, 2010 at 12:34 PM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x11

asked: Oct 04, 2010 at 11:36 AM

Seen: 7931 times

Last Updated: Oct 04, 2010 at 12:27 PM