x

Finding Deleted Records. Use Multiple Stored Procs?

I'm working on a Data Warehouse implementation using SSIS 2008. The source system is in SQL 2000.

The code to find deleted records and insert them into a deleted records table is as follows:

 INSERT INTO MDA.DeleteRecords (TableID, RecordID)
    SELECT Deltd.TableID TablID, NUM.NumberList
      FROM MDA.Numbers NUM
           LEFT OUTER JOIN dbo.slcdpm SLC
              ON NUM.NumberList = SLC.identity_column
           LEFT OUTER JOIN (SELECT DR.TableID, DR.RecordID, DR.DateDeleted
                              FROM    MDA.DeleteTables DT
                                   INNER JOIN
                                      MDA.DeleteRecords DR
                                   ON DR.TableID = DT.TableID
                             WHERE DT.TableName = 'slcdpm') Deltd
              ON Deltd.RecordID = NUM.NumberList
     WHERE     SLC.identity_column IS NULL
           AND NUM.NumberList <= IDENT_CURRENT ('slcdpm')
           AND Deltd.TableID IS NULL

Basically I am using Left Outer Join (where null) twice to find records that exist in a numbers table, but don't exist in the source table, nor are already listed as deleted in the DeleteRecords table.

I'd love to call this whole thing from one stored procedure and just pass in the source table name. However, I know you shouldn't do that and no performance benefit will be gained by using a SP to run dynamic SQL.

I have approximately 300 tables which will have to be checked this way.

Should I actually create 300 separate stored procedures, one for each table?

Use SQL that is dynamically generated in SSIS to pull newly deleted records per source table?

I'd love to use delete triggers, but that won't be allowed on this project.

Any suggestions?

more ▼

asked Jun 20, 2011 at 08:35 AM in Default

avatar image

DavidStein
72 5 5 9

(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

You wouldn't get a performance benefit, but if you're going to use a DeletedRecords table, this could be a case where dynamic SQL could make sense.

If it's not too late, though, I would recommend looking at type-2 slowly changing dimensions and Todd McDermid's SSIS plug-in to manage Kimball-style warehouse dimensions. The type-2 setup, by including effective and expiration dates, lets you handle deleted records without recourse to a separate table, so your fact tables can still maintain foreign key references to dimensions and guarantee referential integrity, something you could not do with a main table and a deleted table.

But again, if that's out of the question (which would be a shame--the setup listed above is problematic, in my opinion), using dynamic SQL to create a deleted record loader and passing in a table name wouldn't strike me as a bad idea. It's a really bad idea if end users had some level of control, but the rules are a bit looser for ETL implementations, and when you know that your warehouse is going to get loaded in one way and by one load process (so you don't have a number of people entering data online, or three totally separate mechanisms for populating the warehouse tables, etc.).

more ▼

answered Jun 20, 2011 at 08:53 AM

avatar image

Kevin Feasel
6.2k 4 7 15

The Deleted Records I'm trying to track are in the source OLTP, not the destination. The destination does have SCD support.

Jun 20, 2011 at 09:39 AM DavidStein

In that case, is there a particular reason to track them in a Deleted Records table on the OLTP side? If it is solely to assist your warehouse processing, you can tell a record has been deleted if an active row for that record exists in the warehouse dimension but does not exist in the source system, so you usually wouldn't need a deleted records table to tell the ETL process that a record is, in fact, deleted--just being missing should do the trick. That would be considerably safer and easier than adding in a relatively complicated method to find deletions and put them in source system tables.

McDermid's tool (which I plug every chance I get, just because it's so much better than the built-in SSIS SCD wizard) also makes it very simple to implement this: it includes an output for deleted records (which it finds by using the method I described above), or you could fold deletes into standard SCD2 expiration.

Jun 20, 2011 at 09:55 AM Kevin Feasel

I am doing it this way because I can't be sure that the source and destination are on different servers and I don't want to send thousands of records across the network to check if they have been marked as deleted in the destination if I don't have to.

Also, I've done it this way so if Triggers become allowed, I can use the exact same table with them. That way I won't have to do the expensive searches for missing records.

Jun 20, 2011 at 11:27 AM DavidStein
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1066
x501
x455
x57

asked: Jun 20, 2011 at 08:35 AM

Seen: 2340 times

Last Updated: Jun 20, 2011 at 08:35 AM

Copyright 2016 Redgate Software. Privacy Policy