question

pcloadletter avatar image
pcloadletter asked

Distributed Always on maintenance

We are getting a very strange issue (strange as in I would not expect that based on the configuration).

Our setup is a distributed always-on replication between two data centers.

DC1 and DC2 have two SQL nodes with synchronous always-on each.

Between the two data centers we have asynchronous always on replication.

DC1 <==> DC2 Distributed Async

DC1

SQL1 <==> SQ2 Sync (readonly secondary)

DC2

SQL3 <==> SQL4 sync (readonly secondary)

The issue happens when we apply patches/ restart secondary SQL server in the secondary data center SQL4. When the Server comes back from restart, it causes very high I/O on all 4 servers, which causes latency and timeouts in our application.

The question is, why would a restart of secondary server on an async replication cause high I/O on not only the primary node but all nodes in the cluster?

All I see in our monitoring application that all servers had high I/O. Primary reporting high check point writes.

PARALLEL_REDO_TRAN_TURN goes up from baseline and Parallel_redo_worker_wait_work goes down from baseline but I don't see any other significant wait types.

This has happened multiple times so we know it's not coincidental. Any thoughts?

always-onmaintenancepatchingdistributed
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

0 Answers

·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.