x

How do I delete every other duplicate in my table?

I have tried different techniques, but I can't figure it out. I really don't want to manually fix over 3000 duplicates. Here is the SQL code to detect them.

SELECT MyTable.[URL], MyTable.[ID], MyTable.[Email],                     
MyTable.[Name], MyTable.[Position], MyTable.[City]                    
FROM MyTable                    
WHERE (((MyTable.[URL]) In (SELECT [URL]                     
                        FROM [MyTable] As Tmp                     
                        GROUP BY [URL] HAVING Count(*)>1 )))                    
ORDER BY MyTable.[URL];                    

Does anybody have any clue?

Thank you,
Mike

more ▼

asked Oct 11 '09 at 05:19 PM in Default

Mike 6 gravatar image

Mike 6
1 1 1 1

(comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

Something like this should do the trick.

DELETE a
FROM MyTable a
JOIN (SELECT MIN(Id) id, Url
FROM MyTable
GROUP BY Url) b ON a.Url = b.Url
AND a.Id <> b.Id
more ▼

answered Oct 11 '09 at 05:26 PM

mrdenny gravatar image

mrdenny
908 3

I keep getting syntax error in From clause.

I have the following table structure:
ID
Email
Name
Position
City
URL

I'm not sure if that matters or not. All our case sensitive.
Oct 11 '09 at 05:44 PM Mike 6
Which column where you trying to use to deturmin if the rows are duplicates?
Oct 12 '09 at 04:23 AM mrdenny
(comments are locked)
10|1200 characters needed characters left

Which version of SQL Server are you using?

more ▼

answered Oct 12 '09 at 03:10 AM

Madhivanan gravatar image

Madhivanan
1.1k 1 3 6

(comments are locked)
10|1200 characters needed characters left

DELETE MyTable
FROM MyTable a
JOIN
(
SELECT URL, ID = MIN(ID)
FROM MyTable
GROUP BY URL
) b
ON a.URL = b.URL
AND a.ID <> b.ID

more ▼

answered Oct 12 '09 at 04:31 AM

nigelrivett gravatar image

nigelrivett
92

(comments are locked)
10|1200 characters needed characters left

Unless I have misunderstood your example, this seems pretty straightforward. If the records are exact duplicates (meaning every field matches), then you have it easy.

Here are the steps:
--export 1 row per duplicate to another table
--note if your rows are not exact duplicates (every field the same) you will need to alter this or send more information, if you need help adjusting the query

SELECT DISTINCT * INTO [dbo].[MyTable_temp] FROM [dbo].[MyTable]            

--eliminate all the rows in the original table

TRUNCATE TABLE [dbo].[MyTable]            

--copy back the rows from the other table

INSERT INTO [dbo].[MyTable] SELECT * FROM [dbo].[MyTable_temp]            

--drop the other table

DROP TABLE [dbo].[MyTable_temp]            

--OPTIONAL?!?! add a unique constaint or primary key (perhaps you have a key on ID?)

ALTER TABLE [dbo].[MyTable]            
ADD CONSTRAINT [IX_MyTable] UNIQUE NONCLUSTERED (            
    URL )            

BE CAREFUL: I have had people tell me to get rid of duplicates when the records actually varied in some other field. If you have duplicate entries for the URL but other fields are not the same then the fix will be more involved. Verify you get the same number for each of the following queries.

--count distinct values on the URL field

SELECT COUNT(DISTINCT [URL]) FROM [dbo].[MyTable]            

--count distinct values on multiple fields... notice the GROUP BY

SELECT COUNT(*) FROM (            
    SELECT NULL AS [CountMe] --[URL],[ID],[Email],[Name],[Position],[City]            
    FROM [dbo].[MyTable]            
    GROUP BY [URL],[ID],[Email],[Name],[Position],[City]            
) AS t            

--if you want an end-to-end example of this, here you go

--create the test table

CREATE TABLE [dbo].[MyTable] (            
    [URL] varchar(128) NOT NULL,            
    [ID] int NOT NULL,            
    [Email] varchar(128) NULL,            
    [Name] varchar(60) NULL,            
    [Position] varchar(30) NULL,            
    [City] varchar(30) NULL )            

--insert some rows and some duplicates

INSERT INTO [dbo].[MyTable] VALUES ('www.123.com', 1, 'abc@123.com', 'abc', 'Manager', 'Milwaukee')            
INSERT INTO [dbo].[MyTable] VALUES ('www.123.com', 1, 'abc@123.com', 'abc', 'Manager', 'Milwaukee')            
INSERT INTO [dbo].[MyTable] VALUES ('www.123.com', 1, 'abc@123.com', 'abc', 'Manager', 'Milwaukee')            
INSERT INTO [dbo].[MyTable] VALUES ('www.456.com', 2, 'me@456.com', 'me', 'Support', 'Madison')            
INSERT INTO [dbo].[MyTable] VALUES ('www.456.com', 2, 'me@456.com', 'me', 'Support', 'Madison')            

--export the 1 row per duplicate to another table

SELECT DISTINCT * INTO [dbo].[MyTable_temp] FROM [dbo].[MyTable]            

--eliminate all the rows in the original table

TRUNCATE TABLE [dbo].[MyTable]            

--copy back the rows from the other table

INSERT INTO [dbo].[MyTable] SELECT * FROM [dbo].[MyTable_temp]            

--drop the other table

DROP TABLE [dbo].[MyTable_temp]            

--OPTIONAL?!?! add a unique constaint or primary key (perhaps you have a key on ID?)

ALTER TABLE [dbo].[MyTable]            
ADD CONSTRAINT [IX_MyTable] UNIQUE NONCLUSTERED (            
    URL )            

--verify the records in the table

SELECT * FROM [dbo].[MyTable]            

--clean up from our test

DROP TABLE [dbo].[MyTable]
more ▼

answered Oct 12 '09 at 07:25 PM

Benjamin gravatar image

Benjamin
320 2 3 4

This example assumes that there's no ID value on the table which @Mike said that there was. Also running that on a table with a few hundred million records would kill the system and take the application down (or at least in a funky state) for quite some time while it runs.
Oct 12 '09 at 09:05 PM mrdenny
Denny, I'm having trouble finding where Mike mentioned there is a unique value in the table. Perhaps he replied to you privately? In any case, your answer is the best one and what I would use if there is a unique value. I was presenting a last-ditch de-duplication process where there is no unique value. I assume the application would be acting sporadically anyway in this case. Also, I thought the table was only a few thousand rows. Would would you say if there are no unique fields: do you temporarily add an identity field and remove after using the DELETE JOIN as above. This is another.
Oct 15 '09 at 06:19 PM Benjamin
(comments are locked)
10|1200 characters needed characters left

I'm going to 2nd mrdenny's answer. It's short and elegant. I usually went the route involving table variables to store distinct data temporarily and later dumping it back to the destination..

more ▼

answered Oct 13 '09 at 11:09 AM

Rajib Bahar gravatar image

Rajib Bahar
238 3 3 7

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x10

asked: Oct 11 '09 at 05:19 PM

Seen: 1523 times

Last Updated: Nov 11 '09 at 06:16 AM