question

Oleg avatar image
Oleg asked

Data purity

Recently our DBA, working on the bells and whistles associated with the looming upgrade from SQL Server 2005 to 2008 R2, has discovered an interesting problem we happen to have with some of our data in one of the databases. I am sure that the problem is caused by one of the migration processes of the past as the original data (about 15 years ago) started its life in AllBase database owned by HP and then was migrated to SQL Server 2000 -> SQL Server 2005 After turning trace flag 3604 we run dbcc checkdb with data\_purity option, and this spells out a rather huge log detailing all affected records. Then we pick one table/column combination and run dbcc checktable to get all affected pages and slots. Then we run dbcc page for the suspect page to see the dump of the data. If there is corrupted data then it usually goes like this: - the column is decimal(14,0) - the data is readable correctly by both select statement and the front ends despite the checkdb verdict - the memory dump of the suppect slot of the page consistently shows 02 instead of 00 in one byte to the left of the useful numeric information (rightmost 5 bytes if the number is small enough). For example, the 6 bytes of the number -3000 should be **00**01B80B0000, but they come as **02**01B80B0000 meaning that the byte located to the left of the significant 5 bytes shows its next to the rightmost bit set when it should not be. This does not change the way the data is displayed, but certainly makes dbcc checkdb complain. Following Microsoft guidelines, we should [disable triggers if any and then] issue a faux update for the record in question [and enable the triggers if any], something like this: update the_schema.suspect_table set suspect_column = suspect_column * 1 where suspect_key = some_value; This works and gets the job done. The real question is about how to do it efficiently? Sure we can manually sift through a couple of gigs worth of log dump, extract the useful information from it (tables, columns, keys) and then issue appropriate faux updates checking the database after the updates are completed to ensure that checkdb now runs without finding any data\_purity problems, but it would be nice to find out whether there is any existing tool which can do the examining part. This task is rather crucial to be completed before we upgrade to SQL Server 2008 R2 because the rumor has it that once the data is pure, it will remain pure for good. Oleg
sql-server-2005dbccdata_purity
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Matt Whitfield avatar image
Matt Whitfield answered
That's an interesting one, isn't it? Have you thought about running a combination of DBCC IND and DBCC PAGE in order to extract the data content for each clustered index, then scan through using a C# app? I did this for encrypted object decryption in my IDE / schema synchroniser - and it was nowhere near as bad as I thought it would be. I am sure if you sat down and started to write it, it would take you less time to write the app than it would to actually run it... Let me know if you want any help with decoding the DBCC output in C# (I'm sure you don't need my help anyway!)
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Matt Whitfield This is an excellent pointer, thank you very much for your help. When I was told about the problem this morning (sort of welcome back to reality from the holidays), it was pretty shocking initially, because usually, when the data is corrupted, the task of restoring the original data is not trivial, but in my case while the data is "corrupted" the values are still readable simply because the corrupted bytes are the ones which are ignored by the engine. Here is the sample of the data:
select 
    cast(0x000E000000B80B0000 as decimal(14, 0)) correct,
    cast(0x000E000200B80B0000 as decimal(14, 0)) corrupted
See that deuce in the 4th byte reading from left to right? Looks weird because it should be 00 not 02 there, but runs correctly, i.e.
correct   corrupted
--------  ----------
-3000     -3000
I am sure that the problem is caused by the original conversion from Allbase. I am glad that the data would be safe and pure in the end :)
0 Likes 0 ·
What happens if you drop and rebuild all the indexes? or drop them and rebuild them on the upgraded version?
0 Likes 0 ·
@Scot Hauder After reading pretty convincing arguments by Paul Randal on [this forum][1] I just would not dare to do it :) Seriously though, I did not try it, but the columns in question are not a part of any index of the suspect tables. [1]: http://www.sqlservercentral.com/Forums/Topic881729-391-1.aspx
0 Likes 0 ·
Sure, if the columns are not part of the index then rebuilding would not be helpful.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.