Hi all, I run an accounting package with SS 2008R2 as the DB. Today we started having an odd issue withe the accounting package - it hangs on any attempts to insert into one of the tables. I run check DB every other day and do DB index maintenance weekly (rebuild or reorganize or update statistics, depending on fragmentation and page count). I cloned the live DB into a test DB to play with and attempt more aggressive diagnostics as well as evaluate repairs and guess what - the DB cloned fine (backup DB and tlog, restore DB and apply tlog) and the cloned DB works perfectly. My solution is likely to be drop the live DB and restore it but I am uneasy not understanding what went wrong here. Any ideas on what failed and how I can automatically detect it moving forward?
In investigating this I did track down the sessions involved and noted that two session were involved: 1) a select statement reading the row to be updated 2) an update statement for the row.
The second session was waiting for the first to complete. The first session was hung on async_network_io.
I am aware that async_network_io usually means the client has not finished reading a result set but in this case the result set is a single row, and in testing I was the only user. As part of testing after hours I disconnected all remote sessions and ran the same update on the server itself so I know that the single row being retrieved across the network was not the issue.
I ultimately resolved the issue by doing an application level backup and restore. For whatever reason doing a DB+Log backup and restore into a second DB in the same instance gave me a working environment but doing a DB+Log backup and restore into the primary DB (same instance) did not change anything, however, using the accounting packages full-export full-import back into the primary DB did work. I still don't know what the cause or problem was so I'm unhappy with the resolution. I am happy that everyone is working as usual today. I am still very much interested in debugging this though my ability to extract new data is limited since I can't reproduce the problem using the backups I took.
answered Jul 08, 2011 at 09:43 AM
If you can post the actual error then it will help a lot but this sounds to me like a locking issue. The test system will not have the same users logged in etc and so the locking is not occurring.
I would start off by running sp_whoisactive to see the TSQL being executed at the time of the error and work on from there.
This is not a database consistency issue - DBCC checks the integrity of the data, it is much more like a programmatic issue
answered Jul 08, 2011 at 04:03 AM