I am going to take this one element at a time. First, I have 8 text files that I have imported into SQL and made separate tables. For sake of simplicity, I have titled them 21,31,41,51,24,34,44,and 54. I am trying to merge those 8 into 1 which would be titled, industry. The reason behind this process is that I have duplicates in various tables. Items in 21 are duplicated in 41. The duplicates do not exist in the tables themselves. The duplicates are not wholly contained in 21, 31, or such. My question for this time is how I begin the merge statement. For now, I will pair it down to merge two items into one. The primary key is a compound one comprised on nine fields. MERGE Industry AS TARGET USING industry21 AS SOURCE ON (TARGET.PrimaryKey1 SOURCE.PrimaryKey1) First, how do I specify the 8 tables in the second? Or do I have to do this one at a time? Second, I have 9 primary keys. Would I specify all 9 in the third line? If so, would it look like below. ON (TARGET.PrimaryKey1 SOURCE.PrimaryKey1, TARGET.PrimaryKey2 SOURCE.PrimaryKey2, and so on)
I will borrow the logic from @Usman Butt answer because it provided all necessary points and the logic for a complete solution. The source tables have an unusually large number of the key columns in their respective composite primary key, so the the join condition looks pretty awkward, but there is not much that can be done about this unless the tables are properly re-designed. In order to remove duplicates (from across all columns perspective), the union may be used in place of the more desired union all. It is still not known whether the situation exists when the data for the same 9-column wide primary key exists in different source tables when the first 9 column values are the same while other columns' values (10 through 22) are different. This scenario needs to be discussed. It does not add much complexity to the solution, but the definition of the rule to use in order to determine which of the rows is the winner is needed. In any case, here is the script which will disclose whether such records exist: select Col01, Col02, Col03, Col04, Col05, Col06, Col07, Col08, Col09 from ( select * from  union select * from  union select * from  union select * from  union select * from  union select * from  union select * from  union select * from  ) t group by Col01, Col02, Col03, Col04, Col05, Col06, Col07, Col08, Col09 having count(1) > 1; go If the script above produces any records then the duplicate records by key, which are different by others do exist. This possible problem aside, the merge statement using the logic from @Usman Butt answer complemented with the actual actions to take is below. Please note that the UPDATE part includes the check to ensure that only the rows which truly need to be updated are subjected to the update. Otherwise, all matching destination rows will be updated even though their respective values in the columns 10 through 22 are the same. There are other ways to sniff whether the matching rows will cause the faux update or not (for example, the [binary_checksum]), but I opted not to use them. Of course it goes without saying that the update on the columns 1 through 9 makes no sense because those are key columns. In other words, if any of the key columns at any of the source tables is updated by some thoughtless process, then such update simply translates to lack of match between source and destination meaning that the destination (industry) row will be deleted and the updated row from the source - inserted. This is normal behaviour because generally, nobody ever updates any key columns in any SQL Server database tables right? If the key column needs to be updated then one would restate such update into delete plus insert anyway, so this is what the MERGE will mimic. Here is the script which will work as expected after the column names that I used are replaced with their actual counterparts (please note that because TARGET and SOURCE are keywords in T-SQL language, I avoid using these words for aliases. naming the target as t and the source as src): merge industry as t using ( -- Remove the rows which are duplicates across all columns by -- using union instead of union all. It is cheesy, but it works select * from  union select * from  union select * from  union select * from  union select * from  union select * from  union select * from  union select * from  ) src -- this is very unfortunate, but needs to be done due to fact that -- there are 9 columns in the composite primary key of every table on t.Col01 = src.Col01 and t.Col02 = src.Col02 and t.Col03 = src.Col03 and t.Col04 = src.Col04 and t.Col05 = src.Col05 and t.Col06 = src.Col06 and t.Col07 = src.Col07 and t.Col08 = src.Col08 and t.Col09 = src.Col09 when matched /* comment out these conditions if faux updates are acceptable */ and (t.Col10 <> src.Col10 or t.Col11 <> src.Col11 or t.Col12 <> src.Col12 or t.Col13 <> src.Col13 or t.Col14 <> src.Col14 or t.Col15 <> src.Col15 or t.Col16 <> src.Col16 or t.Col17 <> src.Col17 or t.Col18 <> src.Col18 or t.Col19 <> src.Col19 and t.Col20 <> src.Col20 and t.Col21 <> src.Col21 or t.Col22 <> src.Col22) then update set Col10 = src.Col10, Col11 = src.Col11, Col12 = src.Col12, Col13 = src.Col13, Col14 = src.Col14, Col15 = src.Col15, Col16 = src.Col16, Col17 = src.Col17, Col18 = src.Col18, Col19 = src.Col19, Col20 = src.Col20, Col21 = src.Col21, Col22 = src.Col22 when not matched by target then insert ( Col01, Col02, Col03, Col04, Col05, Col06, Col07, Col08, Col09, Col10, Col11, Col12, Col13, Col14, Col15, Col16, Col17, Col18, Col19, Col20, Col21, Col22 ) values ( src.Col01, src.Col02, src.Col03, src.Col04, src.Col05, src.Col06, src.Col07, src.Col08,src. Col09, src.Col10, src.Col11, src.Col12, src.Col13, src.Col14, src.Col15, src.Col16, src.Col17, src.Col18, src.Col19, src.Col20, src.Col21, src.Col22 ) when not matched by source then delete; Hope this helps. Oleg :
If you already have temporary structures, then it is better to do all the data deduplication/merging on temporary structures before hitting the base table. This strategy will give you various advantages. For e.g., you know which file would have the preference. So you could keep only that file's data with the duplicate key to be merged in the base table. Rest of the files duplicate data could be deleted or marked as inactive in the temporary structures. If the major chunk of data is duplicated, then you could be processing less data for the base table to be merged. Hence, fewer CPU cycles, less processing time etc. could be achieved I would be tempted to run the process something like the following - Merge/Deduplicate all data in the temporary tables preferably by deleting the duplicate data or mark the duplicate data as inactive/deleted. - If the data is not huge, you could also merge/deduplicate data in a single temporary table. So then only one merge statement would be required - If the data is huge then I would use 8 Merge statements. Yes, I know it won't sound good to you but while working with the huge data it is always best to process it in batches. You would have more granular control, better logging, option to commit/rollback a certain file etc. - But after the merging/deduplication, If you still want to go with all the tables in one Merge statement then as @Lukasz8519 already pointed out you could use UNION ALL for all the temporary tables to make a single source EDIT: Added all composite primary columns MERGE Industry AS TARGET USING ( SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table1 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table2 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table3 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table4 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table5 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table6 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table7 UNION ALL SELECT PrimaryKeyCol, OtherPrimaryColumns, OtherColumns FROM Table8 ) AS SOURCE ON (TARGET.PrimaryKeyCol = SOURCE.PrimaryKeyCol TARGET.PrimaryKeyCol2 = SOURCE.PrimaryKeyCol2 TARGET.PrimaryKeyCol3 = SOURCE.PrimaryKeyCol3 TARGET.PrimaryKeyCol4 = SOURCE.PrimaryKeyCol4 TARGET.PrimaryKeyCol5 = SOURCE.PrimaryKeyCol5 TARGET.PrimaryKeyCol6 = SOURCE.PrimaryKeyCol6 TARGET.PrimaryKeyCol7 = SOURCE.PrimaryKeyCol7 TARGET.PrimaryKeyCol8 = SOURCE.PrimaryKeyCol8 TARGET.PrimaryKeyCol9 = SOURCE.PrimaryKeyCol9 ) If you are sure that in each file the whole data row would be a complete duplicate and no preference is required, then you could simply use **UNION** instead of **UNION ALL** to remove the duplicate rows. There you see there could be more than one solution or multiple combinations, all depends upon your data and what suits you best. Hope it helps.