question

DavidStein avatar image
DavidStein asked

Finding Deleted Records. Use Multiple Stored Procs?

I'm working on a Data Warehouse implementation using SSIS 2008. The source system is in SQL 2000. The code to find deleted records and insert them into a deleted records table is as follows: INSERT INTO MDA.DeleteRecords (TableID, RecordID) SELECT Deltd.TableID TablID, NUM.NumberList FROM MDA.Numbers NUM LEFT OUTER JOIN dbo.slcdpm SLC ON NUM.NumberList = SLC.identity_column LEFT OUTER JOIN (SELECT DR.TableID, DR.RecordID, DR.DateDeleted FROM MDA.DeleteTables DT INNER JOIN MDA.DeleteRecords DR ON DR.TableID = DT.TableID WHERE DT.TableName = 'slcdpm') Deltd ON Deltd.RecordID = NUM.NumberList WHERE SLC.identity_column IS NULL AND NUM.NumberList <= IDENT_CURRENT ('slcdpm') AND Deltd.TableID IS NULL Basically I am using Left Outer Join (where null) twice to find records that exist in a numbers table, but don't exist in the source table, nor are already listed as deleted in the DeleteRecords table. I'd love to call this whole thing from one stored procedure and just pass in the source table name. However, I know you shouldn't do that and no performance benefit will be gained by using a SP to run dynamic SQL. I have approximately 300 tables which will have to be checked this way. Should I actually create 300 separate stored procedures, one for each table? Use SQL that is dynamically generated in SSIS to pull newly deleted records per source table? I'd love to use delete triggers, but that won't be allowed on this project. Any suggestions?
t-sqlsql-server-2000stored-proceduresdynamic-sql
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Kevin Feasel avatar image
Kevin Feasel answered
You wouldn't get a performance benefit, but if you're going to use a DeletedRecords table, this could be a case where dynamic SQL could make sense. If it's not too late, though, I would recommend looking at type-2 slowly changing dimensions and [Todd McDermid's SSIS plug-in to manage Kimball-style warehouse dimensions][1]. The type-2 setup, by including effective and expiration dates, lets you handle deleted records without recourse to a separate table, so your fact tables can still maintain foreign key references to dimensions and guarantee referential integrity, something you could not do with a main table and a deleted table. But again, if that's out of the question (which would be a shame--the setup listed above is problematic, in my opinion), using dynamic SQL to create a deleted record loader and passing in a table name wouldn't strike me as a bad idea. It's a really bad idea if end users had some level of control, but the rules are a bit looser for ETL implementations, and when you know that your warehouse is going to get loaded in one way and by one load process (so you don't have a number of people entering data online, or three totally separate mechanisms for populating the warehouse tables, etc.). [1]: http://dimensionmergescd.codeplex.com/
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

DavidStein avatar image DavidStein commented ·
The Deleted Records I'm trying to track are in the source OLTP, not the destination. The destination does have SCD support.
0 Likes 0 ·
Kevin Feasel avatar image Kevin Feasel commented ·
In that case, is there a particular reason to track them in a Deleted Records table on the OLTP side? If it is solely to assist your warehouse processing, you can tell a record has been deleted if an active row for that record exists in the warehouse dimension but does not exist in the source system, so you usually wouldn't need a deleted records table to tell the ETL process that a record is, in fact, deleted--just being missing should do the trick. That would be considerably safer and easier than adding in a relatively complicated method to find deletions and put them in source system tables. McDermid's tool (which I plug every chance I get, just because it's so much better than the built-in SSIS SCD wizard) also makes it very simple to implement this: it includes an output for deleted records (which it finds by using the method I described above), or you could fold deletes into standard SCD2 expiration.
0 Likes 0 ·
DavidStein avatar image DavidStein commented ·
I am doing it this way because I can't be sure that the source and destination are on different servers and I don't want to send thousands of records across the network to check if they have been marked as deleted in the destination if I don't have to. Also, I've done it this way so if Triggers become allowed, I can use the exact same table with them. That way I won't have to do the expensive searches for missing records.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.