question

Sharma avatar image
Sharma asked

MS SQL 2005: Torn Page Detection V/S Checksum - Database Corruption

Page verify option "CHECKSUM" is introduce in MS SQL2005 and it overlap the Torn Page Detection but mine concern is – Is Verify option only help to deduct the database corruption or potentially help to prevent database corruption in some way? Is this option help to stop writing corrupted data in I/O system? If these option help to prevent database corruption then which option is more strong and why?
sql-server-2005corruptionpage
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Usman Butt avatar image
Usman Butt answered
I guess it is pretty much clear in BOL. > Torn page protection, introduced in > SQL Server 2000, is primarily a way of > detecting page corruptions due to > power failures. For example, an > unexpected power failure may leave > only part of a page written to disk. > When torn page protection is used, a > 2-bit signature is placed at the end > of each 512-byte sector in the page > (after having copied the original two > bits into the page header). The > signature alternates between binary 01 > and 10 with every write, so it is > always possible to tell when only a > portion of the sectors made it to > disk: if a bit is in the wrong state > when the page is later read, the page > was written incorrectly and a torn > page is detected. Torn page detection > uses minimal resources; however, it > does not detect all errors caused by > disk hardware failures. > > Checksum protection, introduced in SQL > Server 2005, provides stronger data > integrity checking. A checksum is > calculated for the data in each page > that is written, and stored in the > page header. Whenever a page with a > stored checksum is read from disk, the > database engine recalculates the > checksum for the data in the page and > raises error 824 if the new checksum > is different from the stored checksum. > Checksum protection can catch more > errors than torn page protection > because it is affected by every byte > of the page, however, it is moderately > resource intensive. When checksum is > enabled, errors caused by power > failures and flawed hardware or > firmware can be detected any time the > buffer manager reads a page from disk. The above quotes make it clear that these are just to detect any corruption, they do not prevent it. Prevention of data corruption should be handled by your environment. This is where huge amount of money is spent for Data Centers with features like Uninterrupted Power supply having servers with RAIDs implemenation etc. As far as which option is safer, CHECKSUM is the clear choice.
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

WilliamD avatar image
WilliamD answered
@Usmann Butt's explanation is good, and the description in BOL is a clear explanation of what the two options actually do. However, what you need to be aware of is that corruption of data can happen on multiple layers and, given a long enough time scale, will happen with any system. There are mechanisms (RAID, USVs, "clever" file systems etc.) to prevent corruption, or alleviate it, but you have to make sure that you have the ability to recover from data loss (these prevention mechanisms can fail too!). This is where a clear and tested backup and restore strategy is required. This runs outside the scope of just backing up and restoring a database, you also need to consider the system as a whole (O/S, SQL Server service etc.). Please don't think that ticking the box for page checksumming somehow magically relieves you of the task of having a data recovery strategy.
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

SmithSean avatar image
SmithSean answered

Torn Page Detection writes a bit for every 512 bytes in the page. This allows you to detect when a page was not successfully written to disk, but does not tell you if the data stored in those 512 bytes is actually correct as a couple of bytes may have been written incorrectly.

Checksum, on the other hand, calculates a checksum value as the final thing before passing it to the IO system to be written to disk. This guarantees that SQL Server had no part in corrupting the page. When SQL Server reads it back, if a single bit is different, it will be caught, and a checksum error (824) will be generated.

Torn Page Detection is the more lightweight of the two, but Checksum is safer, and its overhead is still small enough that it's the better option.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.