question

megamanblue avatar image
megamanblue asked

What happens when you restore a SQL Server 2005 database ?

Hi All,
I have recently resolved an issue which I will describe below which has led me to ask "What happens when you restore a SQL Server 2005 database ?" Scenario:
Users call to say that web application X is returning no data. During investigation the following SQL is run with the following result: **select * from table_a

Msg 7105, Level 22, State 6, Line 1
The Database ID 5, Page (1:81463), slot 53 for LOB data type node does not exist. This is usually caused by transactions that can read uncommitted data on a data page. Run DBCC CHECKTABLE.** So I do as I am told and run the following SQL with the following result: **DBCC CHECKTABLE('table__a')
DBCC results for 'table__a'.
Msg 8928, Level 16, State 1, Line 1
Object ID 2101582525, index ID 0, partition ID 137729312358400, alloc unit ID 137729312358400 (type In-row data): Page (1:75023) could not be processed. See other errors for details. Msg 8941, Level 16, State 2, Line 1 Table error: Object ID 2101582525, index ID 0, partition ID 137729312358400, alloc unit ID 137729312358400 (type In-row data), page (1:75023). Test (sorted [i].offset <= m_freeData) failed. Slot 1, offset 0xffff is invalid. Msg 8941, Level 16, State 2, Line 1 Table error: Object ID 2101582525, index ID 0, partition ID 137729312358400, alloc unit ID 137729312358400 (type In-row data), page (1:75023). Test (sorted [i].offset <= m_freeData) failed. Slot 1, offset 0xffff is invalid. Msg 8929, Level 16, State 1, Line 1 Object ID 2101582525, index ID 0, partition ID 137729312358400, alloc unit ID 137729312358400 (type In-row data): Errors found in off-row data with ID 75574870016 owned by data record identified by RID = (1:80165:3) Msg 8964, Level 16, State 1, Line 1 Table error: Object ID 2101582525, index ID 0, partition ID 137729312358400, alloc unit ID 71913848373575680 (type LOB data). The off-row data node at page (1:75161), slot 43, text ID 69459640320 is not referenced. Msg 8964, Level 16, State 1, Line 1 Table error: Object ID 2101582525, index ID 0, partition ID 137729312358400, alloc unit ID 71913848373575680 (type LOB data). The off-row data node at page (1:75161), slot 45, text ID 69459705856 is not referenced.** Now I prepare to run some tests before applying a fix to the production environment, so I take the most recent backup of the database and restore it to our test SQL Server 2005 environment for this application. Before I begin troubleshooting I run the following SQL on the test environment and get the following result: **DBCC CHECKTABLE('table_a') DBCC results for 'incidents_main'. There are 47782 rows in 4683 pages for object "incidents_main". DBCC execution completed. If DBCC printed error messages, contact your system administrator.** Reasonably chuffed with my good fortune but also disappointed. I do not understand what happened during the restore process to address the problem. Can anyone explain this to me ? May thanks in advance, Marlon PS If you have found this page looking for issues relating to the LOB data-type you may want to try: http://support.microsoft.com/kb/961648/ and http://support.microsoft.com/kb/962209/
sql-server-2005backup-restoredbcc
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

megamanblue avatar image megamanblue commented ·
Hi Guys, thanks for the response. I think my question may have been a tad vague, the backup "should" have contained the corruption as the database had been in this state for several days. The problem became apparent on a friday and was not address until the following tuesday. I specifically took the latest backup (there is a full database backup once a day) to the test environment because I wanted the corruption there to come up with a suitable method of fixing. What I could not and still can not get my head around is how the backup and restore process fixed the issue with this table ?
0 Likes 0 ·
megamanblue avatar image megamanblue commented ·
Sorry for ignoring your question steve but no we are not running read uncommitted data.
0 Likes 0 ·
Tim avatar image Tim commented ·
How do you know when the corruption happened? Did you have a DBCC checkdb job that runs each day that detected the error? I to am stumped as to how a corrupted database could be fixed by restoring a backup of the corrupted database. In my experience I have not been that lucky. :)
0 Likes 0 ·
megamanblue avatar image megamanblue commented ·
Trad, I am perplexed also. I thought this would be the best place to see if anyone else had experienced a similar situation or breakdown the backup/restore process in a way that this would make sense. The corruption became apparent after the users started complaining about a web application that was not working as intended. Through troubleshooting, I turned my attention to the database and tried to run some some queries against the relevant tables. The results triggered the first error message in my original post. This was late on the Friday so I went home (low priority app). On the Monday, I first of all verified that the problem was still present and began setting up my test environment by taking the previous night's backup and restoring to the test environment. Running the DBCC CHECKTABLE command returned no consistency errors ?? At this point, I decided all I knew about SQL Server was redundant and went looking for work as a street sweeper :D
0 Likes 0 ·
Blackhawk-17 avatar image
Blackhawk-17 answered
What Page Verify option is your dB running with? Do you perform your backups with the CHECKSUM option? **EDIT** This is just a guess as to what I believe may be happening in this case. DBCC tells you that some LOB data exists but there is no associated RID. A backup is taken but doesn't grab the non-referenced LOB. A restore no longer has the error. For this to happen the record (or the LOB portion) associated with the LOB was deleted but not the LOB itself. I'm basing this on: > Test (sorted [i].offset failed. Slot 1, offset 0xffff is > invalid. This means that the reference is bad since it points to free space.
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

megamanblue avatar image megamanblue commented ·
@Blackhawk-17 Page verify = TornPageDetection CHECKSUM option is off Is the latter a potential explaination ? This is fascinating. I suppose the next logical question, assuming your hypothesis is correct is, has my data integrity been comprimised as a result ? How would you approach finding out ?
0 Likes 0 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
What do you get if you run the following: SELECT * FROM msdb.dbo.suspect_pages
0 Likes 0 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
@megamanblue - If the row and the LOB are both gone there *should* be no integrity issue. Of course this depends on the relationships in your dB etc. As for determining deeper I'm afraid we're heading into Paul Randall territory.
0 Likes 0 ·
megamanblue avatar image megamanblue commented ·
@Blackhawk-17 0 rows from SELECT * FROM msdb.dbo.suspect_pages @All Thanks for all your contribs :D
0 Likes 0 ·
Steve Jones Editor avatar image
Steve Jones Editor answered
It is possible that you have corruption in your production server that is not in the backup. I would first run dbcc checkdb for an entire check of the database. Next, are you running as read uncommitted? That's a bad idea, in that you can read data that is loaded in a transaction, but not committed. If it rolls back, then you have presented data that isn't necessarily valid.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blackhawk-17 avatar image Blackhawk-17 commented ·
@megamanblue - The NOLOCK hint is also read uncommitted... are queries from the web app coded this way? Are your query options set to READ UNCOMMITTED in SSMS?
0 Likes 0 ·
megamanblue avatar image megamanblue commented ·
@Blackhawk-17 Thanks for joining the discussion. In SSMS the transaction isolation level is set to READ COMMITTED. An extract of the query executed by the web app is below as you can see no hints: SELECT COUNT(*) AS num_records FROM table_a WHERE ((rep_approved like 'REJECT')) This was returning 0 which we knew to be incorrect and prompted the investigation. The actual method of resolution for this problem was: 1.Restrict database access 2.Backup the database 3.Delete the database 4.Recreate the database from the backup taken This coincided with what I had found during the set up of the test environment.
0 Likes 0 ·
Tim avatar image
Tim answered
Restoring the database does just that, it restores the database back to the point in time the backup was taken assuming you are restoring a full backup. The good news is your corruption happened after you last backup. Most of the time these types of errors happen is due to hardware issues, possibly with your disk. You could try running the dbcc repair command then use a compare utility to see what data loss did occur. If it happens to be older data you could at least try to manually recover any data that had been recently applied or if you have been taking tlog backups you can restore you last full backup then apply the recent transaction logs.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.