We all know we need backups. For those systems that are more critical than others we know to replicate, mirror, or log ship. We also know we need to validate our backups by doing regular restores. That is all well and fine, but what happens when you lose the server. Just being able to restore your user database isn't enough. What do others do about users, jobs, dts, ssis? Do you have regular jobs that script all the user objects out? What is the industry standard? These thoughts were triggered for me this weekend by reading "DBA Survivor" by Thomas LaRock and reading a blog by Sean McCown. Curious to hear others thoughts and how you are achieving having a real recovery plan.
How do you handle SQL 2000, 2005, or 2008 differently than the others if at all.
--Additional info after my original question
Sounds like I asked a good question. So far 3 responses by @CirqueDeSQLeil, @Matt Whitfield, and @Fatherjack. All seem to be doing their own thing for recovery plans. In my current situation I have full backups of everything, and on most of my systems I have mirroring setup. What I don't always have is my users, SQL jobs, DTS, and SSIS packages scripted out. I do have master and msdb backed up so I can always do a restore of those and then export the jobs and script out users but in disaster mode I think it would be a lot better if I had to restore my user db's to another server to just access a scripts folder and grab the scripts for all my jobs, users, etc. With that being said, for those that agree should I use T-SQL, Powershell, or what. For those that don't agree, tell me why and how you would handle recovering a user database at 3 AM when you get a call from the NOC saying the server is up in flames and all you have is a full backup and some t-logs.
I appreciate everyones feedback. I am thinking of incorporating all this into a nice blog and possibly a presentation at my local PASS Chapter, then maybe one day at a SQL Saturday. I am thinking of a title, "So you inherited a Database, now what do you do?"
DR is a huge deal. Let's see, starting with the easy stuff, we have regular backups that are run, and tested, in an automated fashion. This is so that we know we can restore the backups. Depending on the system, there are also regular log backups for a point in time recovery, if needed. That's the easy part. The fun stuff, our SAN is mirrored to two off-site locations. It's in a building seperate from our corporate offices with a HUGE fire suppression system, cooling system, multiple power connections and two seperate diesel generators in the event of a power loss. Depending on the system we've also got other types of virtual mirroring running as part of the HA system. There are fully documented DR procedures and we do run throughs of all operations, at least once a year, from the documentation.
Other things, we have all database code checked into source control, versioned, labeled, and we can use that to recreate databases if things go really south.
UPDATED: We also backup our system databases and test restores as part of validating security. Testing, testing, testing, documentation, testing, training, documentation, testing, testing, testing... You get the point. But that's how to make sure you know what to do, how to do it, and that everything will work.
Beyond the nightly backups, I do nightly restores onto a different server. This is to test the nightly backups. In addition, those nightly backups are also stored in a grid that is replicated to a DR site. I should be able to at a minimum access the backups. I am in process of also setting up DR for SSRS. We have jobs that script out user access on a regular basis and SSIS, DTS and RDL files are all stored in TFS.
answered Oct 04 '10 at 11:42 AM
Good stuff from everyone here already, I'll just add my take...
As well as ensuring the system databases are regularly backed up and verified (master for security credentials, msdb for jobs, poss DTS packages, model for ...well if you've made changes to model for you environment, you might want to keep them), the biggest most single important piece for me is documentation about the server.
Each server that gets provisioned is documented so that in the event of having to recreate it (whether in an emergency or not), we have all the settings in place:
all the kinds of things that would give you a complete picture of the server.
A few years ago, I took a team of people (DBAs & infrastructure guys) over to Dell labs in Ireland, armed only with copies of the server docs and recent backups/files/scripts to see if we could get a complete copy of the production server up and running on some test servers. It was certainly one way of testing to see what was and wasn't missing!
answered Oct 04 '10 at 12:23 PM
Kev Riley ♦♦
I think this is a much bigger question - actually. Because backing up SQL Server databases is really very simple, as is the mirroring, log shipping etc. But the services around them is what makes it difficult...
We used to make all our software such that configuration was stored in a predictable directory, and then this configuration was replicated using NT file replication. We used a specific script on a schedule to backup the IIS Metabase from the main site to the DR site (I think we did that once every 4 hours, or we had a process to do it manually after any config change was accepted). However, I'm not sure how the whole active directory / network approach was handled - usually my brain would start to fry shortly after the word 'VLAN' was mentioned!
As for what's industry standard, I have no idea. I just know what I did.
answered Oct 04 '10 at 11:50 AM
Matt Whitfield ♦♦
In our domain we make as much use as possible of windows groups and users for security so that removes some of the headache of rebuilding security - ie no passwords to store/transfer. As you say a script of these is taken so it can be applied in whole or part to a 'new' server. Where a database is mirrored we have already configured security, just to be ready.
Keep in mind that you would also have the job details in system database backups (ie users, logins, jobs, steps, schedules etc etc) if you were really desperate or had missed something.