x

finding ........................

Can any one help me in getting duplicates in a table from every column

more ▼

asked May 18, 2010 at 06:22 AM in Default

avatar image

Avi
1 1 1 1

Can you provide a sample of your data?

May 18, 2010 at 07:00 AM sp_lock

Can you provide the name of the column you want to check the DUPE?

May 19, 2010 at 07:48 AM Leo
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

This is a very vague question that could get a really complicated solution. If you simply want to locate a duplicate row then you will need to use something like:

USE [adventureworks] GO CREATE TABLE Myduplicates ( IDCol INT IDENTITY, ColA varchar(20), ColB VARCHAR(10), ColC int ) GO INSERT INTO [dbo].[Myduplicates] ( [ColA], [ColB], [ColC] ) SELECT 'Larry', -- ColA - varchar(20) 'Curly', -- ColB - varchar(10) 10 -- ColC - int UNION SELECT 'Larry', -- ColA - varchar(20) 'Moe', -- ColB - varchar(10) 20 -- ColC - int UNION SELECT 'Curly', -- ColA - varchar(20) 'Larry', -- ColB - varchar(10) 30 -- ColC - int UNION SELECT 'Moe', -- ColA - varchar(20) 'Curly', -- ColB - varchar(10) 10 -- ColC - int UNION SELECT 'Zeppo', -- ColA - varchar(20) 'Harpo', -- ColB - varchar(10) 10 -- ColC - int UNION SELECT 'Chico', -- ColA - varchar(20) 'Zeppo', -- ColB - varchar(10) 30 -- ColC - int UNION SELECT 'Groucho', -- ColA - varchar(20) 'Zeppo', -- ColB - varchar(10) 20 -- ColC - int go

SELECT [m].[ColA], COUNT(idcol) AS [duplicate count] FROM [dbo].[Myduplicates] AS m GROUP BY [m].[ColA] having COUNT(idcol) > 1 ORDER BY [duplicate count] DESC ;

SELECT [m].[Colb], COUNT(idcol) AS [duplicate count] FROM [dbo].[Myduplicates] AS m GROUP BY [m].[Colb] having COUNT(idcol) > 1 ORDER BY [duplicate count] DESC ;

SELECT [m].[Colc], COUNT(idcol) AS [duplicate count] FROM [dbo].[Myduplicates] AS m GROUP BY [m].[Colc] having COUNT(idcol) > 1 ORDER BY [duplicate count] DESC ;

SELECT [m].[IDCol] AS [IDs that need review for ColA duplicates] FROM [dbo].[Myduplicates] AS m INNER JOIN ( SELECT [m].[ColA], COUNT(idcol) AS [duplicate count] FROM [dbo].[Myduplicates] AS m GROUP BY [m].[ColA] having COUNT(idcol) > 1 ) AS s1 ON [m].[ColA] = [s1].[ColA];

SELECT [m].[IDCol] AS [IDs that need review for ColB duplicates] FROM [dbo].[Myduplicates] AS m INNER JOIN ( SELECT [m].[Colb], COUNT(idcol) AS [duplicate count] FROM [dbo].[Myduplicates] AS m GROUP BY [m].[Colb] having COUNT(idcol) > 1 ) AS s1 ON [m].[Colb] = [s1].[Colb];

go DROP TABLE Myduplicates

resolving the duplicates will be a whole new piece of work

more ▼

answered May 18, 2010 at 07:43 AM

avatar image

Fatherjack ♦♦
43.7k 79 98 117

  • : Stellar effort, considering...

May 18, 2010 at 08:06 AM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left

There are a number of ways to solve this using TSQL. The best these days seem to revolve around using ROW_NUMBER(). The key is to simply understand the basic concept that you need a method to uniquely identify the row. Then you need a way to mark duplicate values for that unique identifier and then you need a mechanism to remove those duplicates. While this sounds like three steps, you should be able to do all this in a single query.

more ▼

answered May 18, 2010 at 08:38 AM

avatar image

Grant Fritchey ♦♦
137k 20 43 81

Grant, have you got any examples of the Row_Number() option please? I started off thinking that way but then decided I would want all rows to see which row I wanted to call the duplicate - IE rows where Row_Number values are 1-n, not 2-n ... that led me to the nested query solution. J

May 18, 2010 at 09:38 AM Fatherjack ♦♦

This is from Simple-Talk...
WITH numbered AS ( SELECT data

                   , row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr

            FROM     @duplicateTable4

          )

 DELETE  FROM numbered

 WHERE   nr > 1
May 18, 2010 at 09:45 AM Grant Fritchey ♦♦

Right, I see. I was thinking that the values in other columns might justfiy the row where nr=3 as the one to keep so rows 1,2+4 get deleted via application... Thanks.

May 18, 2010 at 10:05 AM Fatherjack ♦♦

If you have to get into making judgement calls, there's really no way to automate. Generally you have to define a mechanism for identifying what is a duplicate and then eliminate the extras. If that mechanism is "let me look at it" then...

May 18, 2010 at 10:25 AM Grant Fritchey ♦♦
(comments are locked)
10|1200 characters needed characters left

I hope that I understand the question correctly. The task is to find the duplicate records across all columns in the table. I will also provide the sample of how to quickly delete all such duplicates. Lets create a heap table and insert some records in it (including some duplicates:

create table #t (a int, b int); go

insert into #t values (1, 1); insert into #t values (1, 1); insert into #t values (1, 1); insert into #t values (1, 1); insert into #t values (2, 5); insert into #t values (2, 5); insert into #t values (3, 1); insert into #t values (4, 6); insert into #t values (4, 6); go

Now we have 4 occurences of (1, 1); 2 occurences of (2, 5); (3, 1) does not have any duplicates and we also have 2 occurences of (4, 6). Here is the script to quickly identify all the duplicates:

select
        row_number() over (partition by a order by a) PartitionedNumber, *
        from #t;

Here is the result of the query above:

PartitionedNumber    a           b
-------------------- ----------- -----------
1                    1           1
2                    1           1
3                    1           1
4                    1           1
1                    2           5
2                    2           5
1                    3           1
1                    4           6
2                    4           6

Suppose we want to get rid of all dups while preserving all unique rows. In other words, the end result is expected to have #t with one (1, 1) record, one (2, 5) record, one (3, 1) record, and one (4, 6) record,. The statement to do this can be like this:

with records (PartitionedNumber, a, b) as
(
        select
                row_number() over (partition by a order by a) PartitionedNumber, *
                from #t
)
        delete records where PartitionedNumber > 1;

The above will delete all dups preserving the unique records only.

more ▼

answered May 19, 2010 at 01:02 PM

avatar image

Oleg
17.2k 3 7 28

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x2091
x1069
x5

asked: May 18, 2010 at 06:22 AM

Seen: 935 times

Last Updated: May 18, 2010 at 08:37 AM

Copyright 2016 Redgate Software. Privacy Policy