question

user-981 (google) avatar image
user-981 (google) asked

Troubleshooting failover cluster problem in W2K8 / SQL05

I have an active/passive W2K8 (64) cluster pair, running SQL05 Standard. Shared storage is on a HP EVA SAN (FC).

I recently expanded the filesystem on the active node for a database, adding a drive designation. The shared storage drives are designated as F:, I:, J:, L: and X:, with SQL filesystems on the first 4 and X: used for a backup destination.

Last night, as part of a validation process (the passive node had been offline for maintenance), I moved the SQL instance to the other cluster node. The database in question immediately moved to Suspect status.

Review of the system logs showed that the database would not load because the file "K:\SQLDATA\whatever.ndf" could not be found. (Note that we do not have a K: drive designation.)

A review of the J: storage drive showed zero contents -- nothing -- this is where "whatever.ndf" should have been.

Hmm, I thought. Problem with the server. I'll just move SQL back to the other server and figure out what's wrong..

Still no database. Suspect. Uh-oh. "Whatever.ndf" had gone into the bit bucket.

I finally decided to just restore from the backup (which had been taken immediately before the validation test), so nothing was lost but a few hours of sleep.

The question: (1) Why did the passive node think the whatever.ndf files were supposed to go to drive "K:", when this drive didn't exist as a resource on the active node?

(2) How can I get the cluster nodes "re-syncd" so that failover can be accomplished?

I don't know that there wasn't a "K:" drive as a cluster resource at some time in the past, but I do know that this drive did not exist on the original cluster at the time of resource move.

sql-server-2005clusterfailover
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Steinar avatar image
Steinar answered

In a cluster you can only have data/log files on cluster drives, so it is not possible to place files on network drives or local drives.

And the info about where to find the files of a database is saved in master. Is it possible that someone changed it manually? OR it may be something about driveletter/mountpoints in the cluster perhaps?

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered

(1) Any chance that K: could have been a mapped network drive? If it was mapped on node1 then it would have been local to that node only; failing over to node2 where there was no such mapping would then cause the issue. If node1 was then restarted but the network drive hadn't been mapped permanently, then K: would have disappeared forever

Just a random thought....

(2) not sure what you mean by 're-sync' the nodes? The nodes just run the SQL engine, the data should be shared on cluster resources in your SAN

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.