I have an active/passive W2K8 (64) cluster pair, running SQL05 Standard. Shared storage is on a HP EVA SAN (FC).
I recently expanded the filesystem on the active node for a database, adding a drive designation. The shared storage drives are designated as F:, I:, J:, L: and X:, with SQL filesystems on the first 4 and X: used for a backup destination.
Last night, as part of a validation process (the passive node had been offline for maintenance), I moved the SQL instance to the other cluster node. The database in question immediately moved to Suspect status.
Review of the system logs showed that the database would not load because the file "K:\SQLDATA\whatever.ndf" could not be found. (Note that we do not have a K: drive designation.)
A review of the J: storage drive showed zero contents -- nothing -- this is where "whatever.ndf" should have been.
Hmm, I thought. Problem with the server. I'll just move SQL back to the other server and figure out what's wrong..
Still no database. Suspect. Uh-oh. "Whatever.ndf" had gone into the bit bucket.
I finally decided to just restore from the backup (which had been taken immediately before the validation test), so nothing was lost but a few hours of sleep.
The question: (1) Why did the passive node think the whatever.ndf files were supposed to go to drive "K:", when this drive didn't exist as a resource on the active node?
(2) How can I get the cluster nodes "re-syncd" so that failover can be accomplished?
I don't know that there wasn't a "K:" drive as a cluster resource at some time in the past, but I do know that this drive did not exist on the original cluster at the time of resource move.
Answer by Steinar ·
In a cluster you can only have data/log files on cluster drives, so it is not possible to place files on network drives or local drives.
And the info about where to find the files of a database is saved in master. Is it possible that someone changed it manually? OR it may be something about driveletter/mountpoints in the cluster perhaps?
Answer by Kev Riley ·
(1) Any chance that K: could have been a mapped network drive? If it was mapped on node1 then it would have been local to that node only; failing over to node2 where there was no such mapping would then cause the issue. If node1 was then restarted but the network drive hadn't been mapped permanently, then K: would have disappeared forever
Just a random thought....
(2) not sure what you mean by 're-sync' the nodes? The nodes just run the SQL engine, the data should be shared on cluster resources in your SAN