x

Using T-SQL Merge to Handle Type 2 Slowly Changing Dimensions

The typical pattern for using T-SQL Merge for Type 2 SCD columns is:

 Merge SomeTable as Target
 Using Sourcetable as Source
 On Source.Field(s) = Target.Field(s)
 When Matched and Target.RowIsCurrent = 'Y'
 And (detect differences in Type 2 Fields)
 Then Update.....

 

This is a generalization of course, and I've omitted several steps for brevity. I'm only showing the Expiration portion of the merge. However, since I'm using a Type 2 Merge statement for both my inserting of completely new records (not shown) and handling the expiration and new records for the Type 2 changes, why can't I move the RowIsCurrent portion into the On Clause? Therefore, we would have:

 Merge SomeTable as Target
 Using Sourcetable as Source
 On Source.Field(s) = Target.Field(s) and Target.RowIsCurrent = 'Y'
 When Matched And (detect differences in Type 2 Fields)
 Then Update.....

Is there something I'm missing here? In my own testing, the performance is the same until the number of expired records starts to overtake the number of current rows in the Dimension. Then the performance of this method just gets better and better.

Also, it solves another issue I've had which is non-continuous entities in my Dimension. Suppose that there are entities in the Dimension that should be expired and have NO current record? With the traditional method, the Merge statement ignores this and does NOT insert a new record. However, this new method is only looking for Current Rows to begin with. If it doesn't find one, it inserts a new one as if it's an entirely new entity to the Dimension.

Am I missing something here? I don't know why everyone isn't doing this.

more ▼

asked Jul 29, 2013 at 02:57 PM in Default

avatar image

DavidStein
72 5 5 9

(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

Why can't I move the RowIsCurrent portion into the ON Clause?

The TechNet article Optimizing MERGE Statement Performance explicitly warns about this:

JOIN Best Practices

There are further details in the TechNet article Inserting, Updating, and Deleting Data by Using MERGE:

Another warning

Both articles explain how the syntax of MERGE is expected to be used, and why changing the ON clause or using a CTE can result in incorrect semantics. Briefly, these techniques affect the type of initial join, and can produce incorrect results when NOT MATCHED clauses are present. The safest way to use MERGE is to follow the intended syntax carefully; great care and advanced query plan analysis skills are needed to deviate from that practice.

For a detailed worked example in a similar scenario, see my answer to a question on dba.stackexchange.com

ask.jpg (172.0 kB)
ask.jpg (68.2 kB)
more ▼

answered Jul 30, 2013 at 12:39 AM

avatar image

SQL Kiwi
1.2k 1 3 6

(comments are locked)
10|1200 characters needed characters left

I agree that the expired target rows should not be evaluated - they have no value in the operation (they're not logically part of the "target"). As to the issue you mentioned - imho type 2 merge statements need to have NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET clauses to handle both situations.

more ▼

answered Jul 29, 2013 at 03:36 PM

avatar image

tlaqua
0

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x2159
x428
x53

asked: Jul 29, 2013 at 02:57 PM

Seen: 1887 times

Last Updated: Jul 30, 2013 at 12:39 AM

Copyright 2017 Redgate Software. Privacy Policy