Hi all -
We are trying to set up database mirroring for all of our databases in SQL 2005 SP3 Enterprise, fully patched. We have 6 small databases, one pretty gigantic archive database (168GB), and then 3 mission critical production databases. The production databases are quite large; 68GB, 21GB, and 5GB. These databases are growing daily as we run a very busy eCommerce enterprise that simply CANNOT ever be down.
Our main DB machine handles our load almost effortlessly, but we are looking for a disaster recovery model that will keep us up if it ever flat-out fails, AND so that our data center guys can apply updates etc. during waking hours instead of in the middle of the night. Besides, even being offline in the middle of the night is unacceptable to our enterprise.
We have chosen to use SQL 2005 database mirroring, in high safety (synchronous) mode with a witness server for automatic failover. In order to facilitate the setup, we bought a machine identical to our primary server in every way; CPU, memory, disk configurations, motherboard, everything. the witness is a far less capable machine, but it is still very solid (it was our main DB machine at one time). These three machines sit right in the same rack and are connected using a Gigabit LAN.
Anyway, we have been setting up the first 7 (less critical) databases for mirroring, and testing manual failover. Everything has worked perfectly, without a hitch, even for the 168GB database. So, with great confidence, we started mirroring our first mission critical database. This has led to a disasterous situation (we were totally offline for almost a full hour) that we have now recovered from using SET PARTNER OFF and RESTORE DATABASE WITH RECOVERY, but I am very interested in hearing expert speculation on how this sitaution could have happened, and epscially how to prevent it from happening again.
The 68GB database was set up for mirroring and mirrored successfully. This process was slightly different than the first 7 DBs, in that:
1.) The initial data and log restore on the mirror machine took almost 2 hours, so when the mirror was set up the databases were in "Synchronizing" state for about 15 mins while the mirror caught up with the principal. This database was very busy on the principal while it was being restored on the mirror. However, this went fine, eventually both DBs said "Synchronized."
2.) We were unable to add the witness server to the quorum until after the mirror caught up. This seemed strange, but when we added it everything was fine.
3.) The principal was undergoing its nightly full backups and log backups every 15 mins for almost 24 hours, during which I'm sure that the transaction log was truncated on the principal. But it stayed synchronized with the mirror, and I'm pretty sure that's how mirring is supposed to work (correct me if I'm wrong).
This morning, during a fairly slow period transaction-wise, we tried to test a manual failover of that 68GB mission critical database with SET PARTNER FAILOVER, exactly like we had done effortlessly with the first 7 DBs (including the huge 168GB one) and all hell broke loose. Both DBs said (at various times) "In Recovery" and "Restoring" and then SELECT * FROM sys.database_mirroring (on the principal) showed the partner as DISCONNECTED while the same query (on the mirror) showed all NULLs for that DB. Then they BOTH showed all NULLS, it was aconstant state of insanity.
Attempts to FORCE-SERVICE-ALLOW-DATA-LOSS on the principal were answered with a message stating that the "mirror is busy" and that we should ty the command again later. All this while our tech support guys were answering calls as quickly as they could. It was, as you can imagine, extremely frustrating.
Nearly an hour later, we were successful in issuing RESTORE WITH RECOVERY on the original principal database, and we were back in business at last.
This must never happen to us again. We NEED database mirroring, but we have to figure out what happened here so we can prevent it. I would greatly appreciate any speculation or ideas on things to check from you experts.
I am happy to answer any questions or requests for additional information.
Thanks in advance for your input!