x

Finding Deleted Records. Use Multiple Stored Procs?

I'm working on a Data Warehouse implementation using SSIS 2008. The source system is in SQL 2000.

The code to find deleted records and insert them into a deleted records table is as follows:

INSERT INTO MDA.DeleteRecords (TableID, RecordID)
   SELECT Deltd.TableID TablID, NUM.NumberList
     FROM MDA.Numbers NUM
          LEFT OUTER JOIN dbo.slcdpm SLC
             ON NUM.NumberList = SLC.identity_column
          LEFT OUTER JOIN (SELECT DR.TableID, DR.RecordID, DR.DateDeleted
                             FROM    MDA.DeleteTables DT
                                  INNER JOIN
                                     MDA.DeleteRecords DR
                                  ON DR.TableID = DT.TableID
                            WHERE DT.TableName = 'slcdpm') Deltd
             ON Deltd.RecordID = NUM.NumberList
    WHERE     SLC.identity_column IS NULL
          AND NUM.NumberList <= IDENT_CURRENT ('slcdpm')
          AND Deltd.TableID IS NULL

Basically I am using Left Outer Join (where null) twice to find records that exist in a numbers table, but don't exist in the source table, nor are already listed as deleted in the DeleteRecords table.

I'd love to call this whole thing from one stored procedure and just pass in the source table name. However, I know you shouldn't do that and no performance benefit will be gained by using a SP to run dynamic SQL.

I have approximately 300 tables which will have to be checked this way.

Should I actually create 300 separate stored procedures, one for each table?

Use SQL that is dynamically generated in SSIS to pull newly deleted records per source table?

I'd love to use delete triggers, but that won't be allowed on this project.

Any suggestions?
more ▼

asked Jun 20, 2011 at 08:35 AM in Default

DavidStein gravatar image

DavidStein
62 5 5 7

(comments are locked)
10|1200 characters needed characters left

1 answer: sort newest

You wouldn't get a performance benefit, but if you're going to use a DeletedRecords table, this could be a case where dynamic SQL could make sense.

If it's not too late, though, I would recommend looking at type-2 slowly changing dimensions and [Todd McDermid's SSIS plug-in to manage Kimball-style warehouse dimensions][1]. The type-2 setup, by including effective and expiration dates, lets you handle deleted records without recourse to a separate table, so your fact tables can still maintain foreign key references to dimensions and guarantee referential integrity, something you could not do with a main table and a deleted table.

But again, if that's out of the question (which would be a shame--the setup listed above is problematic, in my opinion), using dynamic SQL to create a deleted record loader and passing in a table name wouldn't strike me as a bad idea. It's a really bad idea if end users had some level of control, but the rules are a bit looser for ETL implementations, and when you know that your warehouse is going to get loaded in one way and by one load process (so you don't have a number of people entering data online, or three totally separate mechanisms for populating the warehouse tables, etc.).

[1]: http://dimensionmergescd.codeplex.com/
more ▼

answered Jun 20, 2011 at 08:53 AM

Kevin Feasel gravatar image

Kevin Feasel
6.1k 3 5 11

The Deleted Records I'm trying to track are in the source OLTP, not the destination. The destination does have SCD support.
Jun 20, 2011 at 09:39 AM DavidStein

In that case, is there a particular reason to track them in a Deleted Records table on the OLTP side? If it is solely to assist your warehouse processing, you can tell a record has been deleted if an active row for that record exists in the warehouse dimension but does not exist in the source system, so you usually wouldn't need a deleted records table to tell the ETL process that a record is, in fact, deleted--just being missing should do the trick. That would be considerably safer and easier than adding in a relatively complicated method to find deletions and put them in source system tables.

McDermid's tool (which I plug every chance I get, just because it's so much better than the built-in SSIS SCD wizard) also makes it very simple to implement this: it includes an output for deleted records (which it finds by using the method I described above), or you could fold deletes into standard SCD2 expiration.
Jun 20, 2011 at 09:55 AM Kevin Feasel

I am doing it this way because I can't be sure that the source and destination are on different servers and I don't want to send thousands of records across the network to check if they have been marked as deleted in the destination if I don't have to.

Also, I've done it this way so if Triggers become allowed, I can use the exact same table with them. That way I won't have to do the expensive searches for missing records.
Jun 20, 2011 at 11:27 AM DavidStein
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x987
x476
x414
x38

asked: Jun 20, 2011 at 08:35 AM

Seen: 1972 times

Last Updated: Jun 20, 2011 at 08:35 AM