2000ms disk response time suddenly when reorganizing indexes
We have a vendor database with vendor configured maintenance plans but they use the built in rebuild index, check integrity, reorganize etc. It is configured to rebuild indexes weekly, reorganize on weekdays. This has been running fine. I have had my eye on the server because the response times on the SAN disks that the databases are on have consistently over 40ms response times. Well suddenly yesterday the response times spiked to over 2000ms, the user databases were pretty much unusable, and the only messages in the SQL server log were regarding IO taking longer than 15 seconds to complete. The reorganize index task was running. When the reorganize indexes task finished the issue was resolved... but it had taken 12 hours when it usually takes less than 2. The next night it happened again. The Avg Disk Read/Write Sec spiked between 50 and 100 in perfmon. Any Ideas how a reorganize index operation could suddenly cause this behavior? An integrity check hasnt been done since the first time the issue occured. I was thinking possibly a corrupt index? SQL 2008 R2
Has the data volume suddenly increased by a large magnitude? Was the SAN doing something at the same time as the reorg was? Have you check on the wait stats to see what SQL Server might be waiting on specifically? I recommend Paul Randal's "Tell Me where It Hurts" blog as a reference.
http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/ If you want to post back what the results of Paul's script is, that might give us a better clue as to where the problem is.
I am thinking this is more of a disk issue than anything SQL would be doing. 2000ms to read/write to the disk is the issue. Much longer latency from the disk would cause the index job to take longer as well. I would be getting your storage folks involved. Are they doing any storage migrations, recently migrated you to another storage array or pool of disk, things like that. Run something like Crystal Benchmark or SQLIO to stress the disk off hours to show it is having latency issues.