Mirroring significantly degrades ETL process performance
I have an accounting system which we recently mirrored to our DR centre. Before this the users had an ETL process that extracts key data for analysis and takes about 3 or 4 minutes to complete. Since mirroring has been implemented this has increased to more than 150 minutes. Nothing seems to be stressed - the network is fine, the SQL Servers are fine, CPU doesnt peak, disk access (read/write/transfers per sec) on both servers is active but not at all high. I know the ETL process uses cursors. When running there is a peak of activity (Send Rate 1800KB/s New Trans 1800KB/s)) on the Primary followed immediately by the corresponding activity on the mirror (Restore Rate 1800KB/s as the process starts and then the send rate drops to ~ 50KB/s and New Trans rate of ~ 50KB/s and the mirror has a Restore Rate of the same ~50KB/s. If I suspend mirroring before the process starts the process takes the normal 3 or 4 minutes and then I resume mirroring and it catches up in about 6 minutes. The Send Rate rises to 7-800KB/s and the Restore Rate matches is at 800KB/s. Mirroring is Sync as it is on SQL Server 2005 Std Edition. I have noticed that Log Flushes and Log Waits rise when mirroring is active and the ETL process is executing. No other metrics seem to rise noticeably. Anyone have any ideas why and how I might be able to resolve this please? I am almost at the point of creating a method for the users to suspend and then resume the mirroring themselves.
Where I generally see this scenario is when the transactions do processing in the same database; work tables and the like. If this can be offloaded to another dB and only the final results applied you will enhance performance. The issue is 2PC and that is why the throughput is reduced... the primary is waiting on acknowledgements of many small transactions and this time adds up. Check for ETL prep work going on in the Primary. **EDIT** ---------- Your other option is to move to Log Shipping rather than the suspend/resume route. It requires manual failover but the contention is probably removed and you don't have to worry about others forgetting steps that could compromise your recovery objectives. Either way you are in a tricky spot.
Too bad it is the 3rd party process. But as Kev Riley rightly said, this could be dangerous. The matter could get more serious, if a GUI is to be made to suspend and resume (could pile a heck of log records/ bigger send queue with the possibilities like if someone forgot to resume it in time). So to me, the suspend and resume should be starting and ending points of the ETL process respectively. Moreover, one cannot be sure that if this process is taking 3/4 minutes now, then the execution time will remain the same especially when CURSORS are used. Believe me, I had a terrible time while dealing with a job having multiple cursors. So the ten minutes cushion may rise to 30 minutes in a bad case and the data loss could be bigger in magnitude. Which may not be allowed according to the DR policy. I do not know whether it is possible or not, but the best would be to involve all the stakeholders and if a change in 3rd party code can do a better job, then it should be the course of action. I know this is an awkward situation, but suspending/resuming process should be the last resort IMHO. And if is implemented, it should be monitored carefully and documented fully in DR policy to inform all the stakeholder about the risks involved. BTW, I would be very interested to know how it is going to be documented. If you can please share some basic info here OR on my email, I would be very thankful to you.