x

How do I delete every other duplicate in my table?

I have tried different techniques, but I can't figure it out. I really don't want to manually fix over 3000 duplicates. Here is the SQL code to detect them.

SELECT MyTable.[URL], MyTable.[ID], MyTable.[Email],                     
MyTable.[Name], MyTable.[Position], MyTable.[City]                    
FROM MyTable                    
WHERE (((MyTable.[URL]) In (SELECT [URL]                     
                        FROM [MyTable] As Tmp                     
                        GROUP BY [URL] HAVING Count(*)>1 )))                    
ORDER BY MyTable.[URL];                    

Does anybody have any clue?

Thank you, Mike

more ▼

asked Oct 11, 2009 at 05:19 PM in Default

avatar image

Mike 6
1 1 1 2

(comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

Something like this should do the trick.

DELETE a            
FROM MyTable a            
JOIN (SELECT MIN(Id) id, Url            
FROM MyTable            
GROUP BY Url) b ON a.Url = b.Url            
    AND a.Id <> b.Id            
more ▼

answered Oct 11, 2009 at 05:26 PM

avatar image

mrdenny
928 2 5

(comments are locked)
10|1200 characters needed characters left

Unless I have misunderstood your example, this seems pretty straightforward. If the records are exact duplicates (meaning every field matches), then you have it easy.

Here are the steps:
--export 1 row per duplicate to another table
--note if your rows are not exact duplicates (every field the same) you will need to alter this or send more information, if you need help adjusting the query

SELECT DISTINCT * INTO [dbo].[MyTable_temp] FROM [dbo].[MyTable]            

--eliminate all the rows in the original table

TRUNCATE TABLE [dbo].[MyTable]            

--copy back the rows from the other table

INSERT INTO [dbo].[MyTable] SELECT * FROM [dbo].[MyTable_temp]            

--drop the other table

DROP TABLE [dbo].[MyTable_temp]            

--OPTIONAL?!?! add a unique constaint or primary key (perhaps you have a key on ID?)

ALTER TABLE [dbo].[MyTable]            
ADD CONSTRAINT [IX_MyTable] UNIQUE NONCLUSTERED (            
    URL )            

BE CAREFUL: I have had people tell me to get rid of duplicates when the records actually varied in some other field. If you have duplicate entries for the URL but other fields are not the same then the fix will be more involved. Verify you get the same number for each of the following queries.

--count distinct values on the URL field

SELECT COUNT(DISTINCT [URL]) FROM [dbo].[MyTable]            

--count distinct values on multiple fields... notice the GROUP BY

SELECT COUNT(*) FROM (            
    SELECT NULL AS [CountMe] --[URL],[ID],[Email],[Name],[Position],[City]            
    FROM [dbo].[MyTable]            
    GROUP BY [URL],[ID],[Email],[Name],[Position],[City]            
) AS t            

--if you want an end-to-end example of this, here you go

--create the test table

CREATE TABLE [dbo].[MyTable] (            
    [URL] varchar(128) NOT NULL,            
    [ID] int NOT NULL,            
    [Email] varchar(128) NULL,            
    [Name] varchar(60) NULL,            
    [Position] varchar(30) NULL,            
    [City] varchar(30) NULL )            

--insert some rows and some duplicates

INSERT INTO [dbo].[MyTable] VALUES ('www.123.com', 1, 'abc@123.com', 'abc', 'Manager', 'Milwaukee')            
INSERT INTO [dbo].[MyTable] VALUES ('www.123.com', 1, 'abc@123.com', 'abc', 'Manager', 'Milwaukee')            
INSERT INTO [dbo].[MyTable] VALUES ('www.123.com', 1, 'abc@123.com', 'abc', 'Manager', 'Milwaukee')            
INSERT INTO [dbo].[MyTable] VALUES ('www.456.com', 2, 'me@456.com', 'me', 'Support', 'Madison')            
INSERT INTO [dbo].[MyTable] VALUES ('www.456.com', 2, 'me@456.com', 'me', 'Support', 'Madison')            

--export the 1 row per duplicate to another table

SELECT DISTINCT * INTO [dbo].[MyTable_temp] FROM [dbo].[MyTable]            

--eliminate all the rows in the original table

TRUNCATE TABLE [dbo].[MyTable]            

--copy back the rows from the other table

INSERT INTO [dbo].[MyTable] SELECT * FROM [dbo].[MyTable_temp]            

--drop the other table

DROP TABLE [dbo].[MyTable_temp]            

--OPTIONAL?!?! add a unique constaint or primary key (perhaps you have a key on ID?)

ALTER TABLE [dbo].[MyTable]            
ADD CONSTRAINT [IX_MyTable] UNIQUE NONCLUSTERED (            
    URL )            

--verify the records in the table

SELECT * FROM [dbo].[MyTable]            

--clean up from our test

DROP TABLE [dbo].[MyTable]            

more ▼

answered Oct 12, 2009 at 07:25 PM

avatar image

Benjamin
320 3 5 8

(comments are locked)
10|1200 characters needed characters left

Which version of SQL Server are you using?

more ▼

answered Oct 12, 2009 at 03:10 AM

avatar image

Madhivanan
1.1k 2 5 9

(comments are locked)
10|1200 characters needed characters left

DELETE MyTable FROM MyTable a JOIN ( SELECT URL, ID = MIN(ID) FROM MyTable GROUP BY URL ) b ON a.URL = b.URL AND a.ID <> b.ID

more ▼

answered Oct 12, 2009 at 04:31 AM

avatar image

nigelrivett
92

(comments are locked)
10|1200 characters needed characters left

I'm going to 2nd mrdenny's answer. It's short and elegant. I usually went the route involving table variables to store distinct data temporarily and later dumping it back to the destination..

more ▼

answered Oct 13, 2009 at 11:09 AM

avatar image

Rajib Bahar
238 3 6 11

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x13

asked: Oct 11, 2009 at 05:19 PM

Seen: 2002 times

Last Updated: Nov 11, 2009 at 06:16 AM

Copyright 2016 Redgate Software. Privacy Policy