Using T-SQL Merge to Handle Type 2 Slowly Changing Dimensions
The typical pattern for using T-SQL Merge for Type 2 SCD columns is: Merge SomeTable as Target Using Sourcetable as Source On Source.Field(s) = Target.Field(s) When Matched and Target.RowIsCurrent = 'Y' And (detect differences in Type 2 Fields) Then Update..... This is a generalization of course, and I've omitted several steps for brevity. I'm only showing the Expiration portion of the merge. However, since I'm using a Type 2 Merge statement for both my inserting of completely new records (not shown) and handling the expiration and new records for the Type 2 changes, why can't I move the RowIsCurrent portion into the On Clause? Therefore, we would have: Merge SomeTable as Target Using Sourcetable as Source On Source.Field(s) = Target.Field(s) and Target.RowIsCurrent = 'Y' When Matched And (detect differences in Type 2 Fields) Then Update..... Is there something I'm missing here? In my own testing, the performance is the same until the number of expired records starts to overtake the number of current rows in the Dimension. Then the performance of this method just gets better and better. Also, it solves another issue I've had which is non-continuous entities in my Dimension. Suppose that there are entities in the Dimension that should be expired and have NO current record? With the traditional method, the Merge statement ignores this and does NOT insert a new record. However, this new method is only looking for Current Rows to begin with. If it doesn't find one, it inserts a new one as if it's an entirely new entity to the Dimension. Am I missing something here? I don't know why everyone isn't doing this.
> *Why can't I move the `RowIsCurrent` portion into the ON Clause?* The TechNet article [Optimizing MERGE Statement Performance] explicitly warns about this: ![JOIN Best Practices] There are further details in the TechNet article [Inserting, Updating, and Deleting Data by Using MERGE]: ![Another warning] Both articles explain how the syntax of `MERGE` is expected to be used, and why changing the `ON` clause or using a `CTE` can result in incorrect semantics. Briefly, these techniques affect the type of initial join, and can produce incorrect results when `NOT MATCHED` clauses are present. The safest way to use `MERGE` is to follow the intended syntax carefully; great care and advanced query plan analysis skills are needed to deviate from that practice. For a detailed worked example in a similar scenario, [see my answer] to a question on
http://technet.microsoft.com/en-us/library/cc879317(v=sql.105).aspx : /storage/temp/1023-ask.jpg :
http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx : /storage/temp/1024-ask.jpg :
I agree that the expired target rows should not be evaluated - they have no value in the operation (they're not logically part of the "target"). As to the issue you mentioned - imho type 2 merge statements need to have NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET clauses to handle both situations.