question

GPO avatar image
GPO asked

Synchronizing large tables on different servers

Hi Gurus I'm sure this is a common problem. I have several tables on different servers that I need to keep synchronised once a day. A typical table (call it T1) looks like this: ~70 million rows, growing by 50,000 a day, 10 cols, 30GB. The tables that I can access are in a daily restored backup. So T1 is restored daily onto Server1. I need to maintain a copy of T1 (and some others) on Server2. The database is too big to be doing a restore of last night's backup onto that server. Truncating the Server2 tables and copying all the rows over every day, via SSIS takes too long, and it seems a bit silly to copy 69.95 million unchanged rows for the sake of 50,000 changes. What is the most efficient way of detecting the differences between Server1 T1 and Server2 T1, without copying the entire table(s) across? I don't have reliable modified_date fields to work with. I could create a second table (T2) on Server1, then I could - get the differences between Server1 T1 and Server1 T2 - SSIS the differences to a staging table on Server2 - MERGE the differences into Server2 T1 - MERGE the differences between Server1 T1 and Server1 T2 into Server1 T2. ...but that effectively creates ANOTHER copy of the data, taking up a lot of disk space. I can't help wondering whether there's a better way.
mergeetlchange-data-capture
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Cyborg avatar image
Cyborg answered
This is just raw Idea... One is with [SCD - Slowly Changing Dimension][1] - Here you can compare destination data with the source. Here you have the option to insert the missing rows to destination table, Update the difference to destination. To delete rows in destination, you should create a separate transformation. Another way is with triggers, Create triggers that fires for Insert\Update\Delete on your source table, this triggers should insert rows to a staging table which has columns - Unique key of source table, a flag column for the type of change(Insert\Delete\Update) and the date (getdate()). Create an SSIS package that reads data from this staging table to find the changed data for the given day, get those rows from your source table and apply the changes to the destination table. Your staging table should not have any duplicate Key column, for Inserts on source table, insert a row. For updates and Deletes changes update staging table columns DML Flag and Date column if the rows already exists, else Insert new row with correct DML flag. [1]: http://msdn.microsoft.com/en-us/library/ms141715.aspx
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

GPO avatar image GPO commented ·
Many thanks for your interest @cyborg: SCD - Surely for that to work I'm going to end up comparing tens of millions of rows that live on two different servers. I'll experiment with it for small numbers of rows and scale up from there to see if it cuts the mustard. Somehow I don't think Microsoft had 60 million row "dimension" tables in mind when they designed it (please flame me everybody if scds this big are more commonplace than I think they are)! Kind of stretches the definition of what a dimension is:-) Triggers - not an option. Our data is a daily restore of a backup. The real "source" table is owned by the vendor and they aint playing ball. The other thing that's killing me is that while the vendor does have [date_modified] columns on some tables but they've made some fairly fundamental mistakes in the past and I just don't trust the application to update the [date_modified] column in 100.00% of cases where a change is made.
0 Likes 0 ·
GPO avatar image GPO commented ·
@Grant Fritchley: I would like to get the acceptance status that you added, removed from this answer. Can you help?
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.