question

jhowe avatar image
jhowe asked

identify and remove duplicates help!

hello I have the following sample rows below (my table is much bigger) what is the simplest way to structure a query to find the duplicates in this table? Obviously the ID's are all different but any row with more than one entry for V and D is considered a duplicate. So as you can see 1023, 1024 has one row each plus two dupes... I need a query that will identify and remove these... please help! I've tried using queries with having count > 1 etc. but can't seem to get the results i want... oh and sorry for the formatting! can't workout how to get it all lined up! ;WITH [Count] AS ( SELECT * , row_number() OVER ( PARTITION BY V, D ORDER BY V ) AS [RowCount] FROM dbo.ViewMembership ) --Delete from cte where rowcnt >1 ID V D DT S 37 4 1023 3 0 85 4 1023 3 0 138 4 1023 3 0 165 4 1024 3 0 112 4 1024 3 0 64 4 1024 3 0 81 4 1025 3 0 33 4 1025 3 0 134 4 1025 3 0 133 4 1026 3 0
sql-server-2008sql-server-2008-r2
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered
;with cte As( Select *, row_number() over(partition by v, d order by id) as rowcnt from yourtable ) Delete from cte where rowcnt >1 This will create a result set with a unique row number for each combination of the columns v and d, and delete the duplicate records if you want to test it first, you can use a select instead of the delete statement. ;with cte As( Select *, row_number() over(partition by v, d order by id) as rowcnt from yourtable ) SELECT * FROM cte WHERE rowcnt >1 Your issue with duplicates can be prevented in the future if you add an unique constraint.
9 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
If you remove the DELETE statement, from my sample, then you need to add a select from the common table expression instead
1 Like 1 ·
jhowe avatar image jhowe commented ·
right i've tried the query out, had to make sure i partitioned by the right columns and it's worked perfectly... a bit staggering when you've never used CTE's, OVER or PARTITION before, but once you get your head round it, it's not too complicated! Thanks so much for your help...
1 Like 1 ·
Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
+1 (and I would give an extra +1 for probably writing the solution on a tiny, tiny portable device if AskSSC allowed it). But I'm guessing you really mean ROW_NUMBER, not ROW_COUNT (?)
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
Thanks, it was written on my phone in the car, waiting in the parking lot. :) and of course it should be row_number, thanks.
0 Likes 0 ·
jhowe avatar image jhowe commented ·
Hi this seems like a very simple/elegant solution but am getting a syntax error "Incorrect syntax near ')'." for some reason? I'm trying to run the select without the delete first, can't see any reason why that would break it? I've pasted your modded code in my original question... thanks for your help.
0 Likes 0 ·
Show more comments
Mark17 avatar image
Mark17 answered
I think this will work: Basically, select the minimum id, each of the columns, and group by each column as a derived table. This will give you the rows without the duplicates. Then from there you delete everything where the id is not in that derived table. You can test it with this: CREATE TABLE #test ( id INT IDENTITY(1,1), v VARCHAR(50), d VARCHAR(50), dt VARCHAR(50), s VARCHAR(50) ) INSERT INTO #test SELECT '4', '1023', '3', '0' INSERT INTO #test SELECT '4', '1023', '3', '0' INSERT INTO #test SELECT '4', '1023', '3', '0' INSERT INTO #test SELECT '4', '1024', '3', '0' INSERT INTO #test SELECT '4', '1024', '3', '0' INSERT INTO #test SELECT '4', '1024', '3', '0' INSERT INTO #test SELECT '4', '1025', '3', '0' INSERT INTO #test SELECT '4', '1025', '3', '0' INSERT INTO #test SELECT '4', '1025', '3', '0' INSERT INTO #test SELECT '4', '1026', '3', '0' DELETE FROM #test WHERE id NOT IN (SELECT id FROM ( SELECT MIN(id) AS id, v, d, dt, s FROM #test GROUP BY v, d, dt, s ) AS RowsToKeep )
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.