So you get a call any time during the day, lets not just pretend it is 3:00 AM. You get the typical call that either users can't connect, the server is slow, etc etc. I am wanting to put together a general check list for my dba's and especially on call dba to go through anytime they get called for a critical system down situation. Some examples SQL Server Log - failed logins, SQL errors Windows Event Log - Services stopped, system errors, alerts Check for blocking SP_Whoisactive Performance Counters - CPU, Memory, Disk Page Life Expectancy, did something just flush the cache and you are in seconds? and so forth. Would love real world examples of how some of these things saved your neck.
I use Brent Ozar's checklist as a template. [ozar template] I add a couple of steps for virtualization issues. I once got a call about a sql server (vmware) that stopped working. Could not connect to the server over the network, launched the console and I could not find any network errors on the server. Sql was up and running. Almost thought it was a prank.. Picked up the phone and called mr virtualization and what do tou know, my server was accidentaly moved to a host where the network was not configured.. So I moved it to a another host, instantly started working. Updated my checklist with "vmware host connectivity steps". :