question

marka92 avatar image
marka92 asked

Remove duplicate rows from a table

I have a table that has duplicate registrants. This was a result of my poor design when I was new to SQL. When a new registrant is added to my table, a RegistrationID is assigned to them. It is an identity column. I now have many duplicate records that are the same person, but with a different RegistrationID. I also have a table with registrationid and trainingid in it so that I have a relationship between the training table and the registration table. I need to get ride of the duplicates in my registration table. I am not sure how to go about this, especially since the table is in production. I did some research and here is what I came up with: --SELECT Distinct FirstName, LastName, Address1 //create another table and move the values to it --INTO TempTb FROM TrainingRegistration --GROUP BY FirstName, LastName, Address1 --having COUNT(*) > 1 --GO --SELECT * FROM TempTb //check to see the result --where FirstName <> '' and LastName <> '' --GO --SELECT * FROM TrainingRegistration //check the original table --DELETE TrainingRegistration WHERE FirstName and LastName IN (SELECT FirstName FROM TempTb)//Delete all rows from the original table that also reside in the duplicate table --INSERT TrainingRegistration SELECT * FROM TempTb //move the rows back to the original table --DROP TABLE TableTb //Drop the Temp table Any guidance with this would really help me.
duplicates
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered
No that delete isn't going to work. Are you using SQL 2005+? If so you can use row_number() partitioning function to filter out those rows that are not the 1st occurance of the FirstName, LastName, Address1 combination. From that return the RegistrationIDs that you can then delete select RegistrationID from ( select RegistrationID, row_number()over(partition by FirstName, LastName, Address1 order by RegistrationID) as rownum from @TrainingRegistration ) a where rownum > 1
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

marka92 avatar image marka92 commented ·
When you say "No that delete isn't going to work." are you referring to my post? I am using SQL 2008. I ran this; select RegistrationID from ( select RegistrationID, row_number()over(partition by FirstName, LastName, Address1 order by RegistrationID) as rownum from TrainingRegistration ) a where rownum > 1 It returned 227 rows. Are you saying that these are the ones I can delete? Is there a way to copy the table and try it on a test table first?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Yes I was referring to the delete in your question. Sure you can try this on a test table, use select * into TrainingRegistration_TEST from TrainingRegistration and then run that code against the test version
0 Likes 0 ·
marka92 avatar image marka92 commented ·
I made a copy of my database and I'm ready to try your suggestion now. I just delete the 227 RegistrationIDs that the query returns, right?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
If you are still wary, wrap the whole query in a transaction, query the table at the end to check the results, and then rollback the transaction. Once you are happy change the rollback to a a commit.
0 Likes 0 ·
marka92 avatar image marka92 commented ·
I'm sorry, I'm confused, Kev. After I copy my data to my test table using; select * into TrainingRegistration_TEST from TrainingRegistration Then I run this; SELECT RegistrationID from ( select RegistrationID, row_number()over(partition by FirstName, LastName, Address1 order by RegistrationID) as rownum from TrainingRegistration ) a where rownum > 1 And delete the rows returned? How do I delete, just add Delete to the front of the query instead of select? You have been very helpful brother, thank you.
0 Likes 0 ·
Show more comments
mjharper avatar image
mjharper answered
Here's how I normally handle duplicates... The below query creates some DemoData to work with. It then uses a CTE to add a row number to the data - if this row number is more than one then it's a dupicate. So I then remove all records where the row number is more than one. Try it out with the demo data and hopefully you can see what it does. One thing to note is that this will always keep the lowest ID when it finds a duplicate. If this table is in production I don't know if that's acceptable for you! Cheers IF OBJECT_ID('tempdb..#DemoData') IS NOT NULL DROP TABLE #DemoData CREATE TABLE #DemoData ( Id INT IDENTITY(1,1) , FirstName VARCHAR(100) , LastName VARCHAR(100) , Address1 VARCHAR(100) ) INSERT INTO [#DemoData] VALUES ( 'John', 'Smith', '1 The Street' ) --duplicate INSERT INTO [#DemoData] VALUES ( 'John', 'Smith', '1 The Street' ) INSERT INTO [#DemoData] VALUES ( 'James', 'Lewis', '1 The Road' ) INSERT INTO [#DemoData] VALUES ( 'Fred', 'Bloggs', '1 The Avenue' ) INSERT INTO [#DemoData] VALUES ( 'Peter', 'Smith', '1 The Close' ) --another duplicate INSERT INTO [#DemoData] VALUES ( 'Peter', 'Smith', '1 The Close' ); WITH FindDups AS ( SELECT [a].[FirstName] , [a].[LastName] , [a].[Address1] , ROW_NUMBER() OVER(PARTITION BY [a].[FirstName], [a].[LastName], [a].[Address1] ORDER BY id) AS DupIdentifier FROM #DemoData a ) DELETE FROM [FindDups] WHERE [FindDups].[DupIdentifier] != 1 SELECT * FROM [#DemoData] AS dd
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

marka92 avatar image marka92 commented ·
Instead of manually entering all of my values(I have 228), how else do you get the values into your DemoTable when you have allot?
0 Likes 0 ·
marka92 avatar image marka92 commented ·
Disregard my last post, your solution is the same as Kevs. Thanks you for getting me started.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.