I have a table that has duplicate registrants. This was a result of my poor design when I was new to SQL. When a new registrant is added to my table, a RegistrationID is assigned to them. It is an identity column. I now have many duplicate records that are the same person, but with a different RegistrationID. I also have a table with registrationid and trainingid in it so that I have a relationship between the training table and the registration table. I need to get ride of the duplicates in my registration table. I am not sure how to go about this, especially since the table is in production. I did some research and here is what I came up with: --SELECT Distinct FirstName, LastName, Address1 //create another table and move the values to it --INTO TempTb FROM TrainingRegistration --GROUP BY FirstName, LastName, Address1 --having COUNT(*) > 1 --GO --SELECT * FROM TempTb //check to see the result --where FirstName <> '' and LastName <> '' --GO --SELECT * FROM TrainingRegistration //check the original table --DELETE TrainingRegistration WHERE FirstName and LastName IN (SELECT FirstName FROM TempTb)//Delete all rows from the original table that also reside in the duplicate table --INSERT TrainingRegistration SELECT * FROM TempTb //move the rows back to the original table --DROP TABLE TableTb //Drop the Temp table Any guidance with this would really help me.
No that delete isn't going to work. Are you using SQL 2005+? If so you can use row_number() partitioning function to filter out those rows that are not the 1st occurance of the FirstName, LastName, Address1 combination. From that return the RegistrationIDs that you can then delete select RegistrationID from ( select RegistrationID, row_number()over(partition by FirstName, LastName, Address1 order by RegistrationID) as rownum from @TrainingRegistration ) a where rownum > 1
Here's how I normally handle duplicates... The below query creates some DemoData to work with. It then uses a CTE to add a row number to the data - if this row number is more than one then it's a dupicate. So I then remove all records where the row number is more than one. Try it out with the demo data and hopefully you can see what it does. One thing to note is that this will always keep the lowest ID when it finds a duplicate. If this table is in production I don't know if that's acceptable for you! Cheers IF OBJECT_ID('tempdb..#DemoData') IS NOT NULL DROP TABLE #DemoData CREATE TABLE #DemoData ( Id INT IDENTITY(1,1) , FirstName VARCHAR(100) , LastName VARCHAR(100) , Address1 VARCHAR(100) ) INSERT INTO [#DemoData] VALUES ( 'John', 'Smith', '1 The Street' ) --duplicate INSERT INTO [#DemoData] VALUES ( 'John', 'Smith', '1 The Street' ) INSERT INTO [#DemoData] VALUES ( 'James', 'Lewis', '1 The Road' ) INSERT INTO [#DemoData] VALUES ( 'Fred', 'Bloggs', '1 The Avenue' ) INSERT INTO [#DemoData] VALUES ( 'Peter', 'Smith', '1 The Close' ) --another duplicate INSERT INTO [#DemoData] VALUES ( 'Peter', 'Smith', '1 The Close' ); WITH FindDups AS ( SELECT [a].[FirstName] , [a].[LastName] , [a].[Address1] , ROW_NUMBER() OVER(PARTITION BY [a].[FirstName], [a].[LastName], [a].[Address1] ORDER BY id) AS DupIdentifier FROM #DemoData a ) DELETE FROM [FindDups] WHERE [FindDups].[DupIdentifier] != 1 SELECT * FROM [#DemoData] AS dd