How to update Flag in slowly changing dimension?

Hi, i have created an SSIS package using Slowly Changing Dimension Transformation. i have IsActive (Y/N) Flag in Source and Destination table. now when i execute my package it works fine. i am using Type 2 SCD so it adds new record. Now, for every new record it would have Y in IsActive Flag but for its old record it should be set to N.

Can somebody tell me how do i update my SSIS package so it would update the record to N if its changed?

more ▼

asked Jan 20, 2012 at 12:46 PM in Default

coolnickhere gravatar image

11 1 1 1

(comments are locked)
10|1200 characters needed characters left

1 answer: sort newest

You should have a Historical Attribute Inserts Output connected to an OLE DB Update Command where you UPDATE the IsActive to No for IsActiveYes and where the business key is equal to business key coming through this output. Then you can union this output with New Output which contains all new records and insert those to the table with flag IsActive = Yes.

The Historical Attribute Inserts Output contains records which are updates to existing one for the SCD type 2. So as mentioned above you update current records of incoming business key to Non Current and then insert the new data with Current Flag.

By default the SCD wizzard takes care about the scenario I have described above. If you have chosen a single field for detection of the current record, then it is exactly the scenario described. If you have validity dates fields and chosen those in the wizard, then the solution is the same. If you want to use both validity range and Current Record flag, then simply use one or other method in SCD wizard and simply update the command n the OLE DB Command on the Historical Attribute Inserts Output Path to set the IsCurrent to False or the ValidTo date to the current date for ending the record validity.
more ▼

answered Jan 20, 2012 at 01:56 PM

Pavel Pawlowski gravatar image

Pavel Pawlowski
22.3k 9 11 21

Nice answer Pavel, I tend to prefer rolling my own SCD functionality with MERGE instead of using these components. What do you in practice?
Jan 22, 2012 at 08:19 AM Scot Hauder

I'm also using my own implementation for SCD as the Default SCD component is very slow on higher numbers of records and sends unnecessary queries to DB for each row coming.

By myself I'm using Lookups and HASH values. First lookup with full cache which loads surrogate keys, business key, hash and inferred member indicator.

For incoming results HASH is calculated and then the lookup is done. If record doesn't exists then it is inserted. When exists, then a conditional split detects change by comparing a hash values of existing data and the currently coming data. If hash values do not match or IsInferred flag of existing data is set to true, then an Update is send to DB. For SCD2 UPDATE which invalidates current record followed by insert of new record.

Generally I use the OLE DB command for updates in the data flow unless an heavy update activity is expected on the dimension. If a heavy update is expected then I use an insert into intermediate table and MERGE command for invalidate current record and insert new records for SCD2 or UPDATE command for SCD1.

The SSIS processing using Lookup transformation allows whatever processing necessary in the SSIS layer including high performance, great logging possibility and configuration flexibility.

MERGE command provides BEST performance, but is not so flexible when there are multiple databases (staging, DW, Logging DB etc). Especially when for whatever purposes you need to substitute one of them by some other (other name), you need to alter all the MERGE commands if they work with multiple databases). Also sometimes a linked servers are necessary if those DBs are on different instances.

When using pure SSIS processing you are very flexible in terms of configuration and databases location. Well designed ETL will give a good performance (of course not the speed of the MERGE statement). I've used the pure MERGE solution for few multi-milion records dimensions. But as I have mentioned if it is not critical I prefer the SSIS way for the mentioned logging and configuration flexibility.

When filling fact tables and looking-up the surrogate keys I'm again using the Lookups with Full Cache (when dimension is role played and used multiple time then with Cache Manager) for primary lookup. If value is not found (inferred member - early arriving fact) then I have another lookup on the No Match output, but this one with Partial Cache with custom command executing a stored proc for creating the inferred member which returns the generated surrogate key. This is a very fast solution for inferred members and also is safe even when multiple lookups for that dimension are run in parallel.
Jan 22, 2012 at 10:47 AM Pavel Pawlowski
Thanks Pavel, I do much of what you do but not to quite the scale..Full Cache/Cache Manager is definitely a plus for SSIS
Jan 22, 2012 at 11:06 AM Scot Hauder
@Pavel Pawlowski @Scot Hauder +1s to both. These indeed are key implementations and needs more elevation then being placed in comments.
Jan 23, 2012 at 01:22 AM Usman Butt
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Jan 20, 2012 at 12:46 PM

Seen: 1595 times

Last Updated: Jan 20, 2012 at 12:46 PM