x

finding ........................

Can any one help me in getting duplicates in a table from every column

more ▼

asked May 18, 2010 at 06:22 AM in Default

Avi gravatar image

Avi
1 1 1 1

Can you provide a sample of your data?
May 18, 2010 at 07:00 AM sp_lock
Can you provide the name of the column you want to check the DUPE?
May 19, 2010 at 07:48 AM Leo
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

This is a very vague question that could get a really complicated solution. If you simply want to locate a duplicate row then you will need to use something like:

USE [adventureworks]
GO
CREATE TABLE Myduplicates
    (
      IDCol INT IDENTITY,
      ColA varchar(20),
      ColB VARCHAR(10),
      ColC int
    )
GO
INSERT  INTO [dbo].[Myduplicates] ( [ColA], [ColB], [ColC] )
        SELECT  'Larry', -- ColA - varchar(20)
                'Curly', -- ColB - varchar(10)
                10  -- ColC - int
        UNION
        SELECT  'Larry', -- ColA - varchar(20)
                'Moe', -- ColB - varchar(10)
                20  -- ColC - int
        UNION
        SELECT  'Curly', -- ColA - varchar(20)
                'Larry', -- ColB - varchar(10)
                30  -- ColC - int
        UNION
        SELECT  'Moe', -- ColA - varchar(20)
                'Curly', -- ColB - varchar(10)
                10  -- ColC - int
        UNION
        SELECT  'Zeppo', -- ColA - varchar(20)
                'Harpo', -- ColB - varchar(10)
                10  -- ColC - int
        UNION
        SELECT  'Chico', -- ColA - varchar(20)
                'Zeppo', -- ColB - varchar(10)
                30  -- ColC - int
        UNION
        SELECT  'Groucho', -- ColA - varchar(20)
                'Zeppo', -- ColB - varchar(10)
                20
  -- ColC - int
go

SELECT  [m].[ColA],
        COUNT(idcol) AS [duplicate count]
FROM    [dbo].[Myduplicates] AS m
GROUP BY [m].[ColA]
having  COUNT(idcol) > 1
ORDER BY [duplicate count] DESC ;

SELECT  [m].[Colb],
        COUNT(idcol) AS [duplicate count]
FROM    [dbo].[Myduplicates] AS m
GROUP BY [m].[Colb]
having  COUNT(idcol) > 1
ORDER BY [duplicate count] DESC ;

SELECT  [m].[Colc],
        COUNT(idcol) AS [duplicate count]
FROM    [dbo].[Myduplicates] AS m
GROUP BY [m].[Colc]
having  COUNT(idcol) > 1
ORDER BY [duplicate count] DESC ;

SELECT  [m].[IDCol] AS [IDs that need review for ColA duplicates]
FROM    [dbo].[Myduplicates] AS m
        INNER JOIN ( SELECT [m].[ColA],
                            COUNT(idcol) AS [duplicate count]
                     FROM   [dbo].[Myduplicates] AS m
                     GROUP BY [m].[ColA]
                     having COUNT(idcol) > 1
                   ) AS s1 ON [m].[ColA] = [s1].[ColA];

SELECT  [m].[IDCol] AS [IDs that need review for ColB duplicates]
FROM    [dbo].[Myduplicates] AS m
        INNER JOIN ( SELECT [m].[Colb],
                            COUNT(idcol) AS [duplicate count]
                     FROM   [dbo].[Myduplicates] AS m
                     GROUP BY [m].[Colb]
                     having COUNT(idcol) > 1
                   ) AS s1 ON [m].[Colb] = [s1].[Colb];

go
DROP TABLE Myduplicates

resolving the duplicates will be a whole new piece of work

more ▼

answered May 18, 2010 at 07:43 AM

Fatherjack gravatar image

Fatherjack ♦♦
42.7k 75 79 108

+1 : Stellar effort, considering...
May 18, 2010 at 08:06 AM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left

There are a number of ways to solve this using TSQL. The best these days seem to revolve around using ROW_NUMBER(). The key is to simply understand the basic concept that you need a method to uniquely identify the row. Then you need a way to mark duplicate values for that unique identifier and then you need a mechanism to remove those duplicates. While this sounds like three steps, you should be able to do all this in a single query.

more ▼

answered May 18, 2010 at 08:38 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
101k 19 21 74

Grant, have you got any examples of the Row_Number() option please? I started off thinking that way but then decided I would want all rows to see which row I wanted to call the duplicate - IE rows where Row_Number values are 1-n, not 2-n ... that led me to the nested query solution. J
May 18, 2010 at 09:38 AM Fatherjack ♦♦

This is from Simple-Talk...
WITH numbered AS ( SELECT data

                  , row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr

           FROM     @duplicateTable4

         )

DELETE  FROM numbered
WHERE nr > 1
May 18, 2010 at 09:45 AM Grant Fritchey ♦♦
Right, I see. I was thinking that the values in other columns might justfiy the row where nr=3 as the one to keep so rows 1,2+4 get deleted via application... Thanks.
May 18, 2010 at 10:05 AM Fatherjack ♦♦
If you have to get into making judgement calls, there's really no way to automate. Generally you have to define a mechanism for identifying what is a duplicate and then eliminate the extras. If that mechanism is "let me look at it" then...
May 18, 2010 at 10:25 AM Grant Fritchey ♦♦
(comments are locked)
10|1200 characters needed characters left

I hope that I understand the question correctly. The task is to find the duplicate records across all columns in the table. I will also provide the sample of how to quickly delete all such duplicates. Lets create a heap table and insert some records in it (including some duplicates:

create table #t (a int, b int);
go

insert into #t values (1, 1);
insert into #t values (1, 1);
insert into #t values (1, 1);
insert into #t values (1, 1);
insert into #t values (2, 5);
insert into #t values (2, 5);
insert into #t values (3, 1);
insert into #t values (4, 6);
insert into #t values (4, 6);
go

Now we have 4 occurences of (1, 1); 2 occurences of (2, 5); (3, 1) does not have any duplicates and we also have 2 occurences of (4, 6). Here is the script to quickly identify all the duplicates:

select
        row_number() over (partition by a order by a) PartitionedNumber, *
        from #t;

Here is the result of the query above:

PartitionedNumber    a           b
-------------------- ----------- -----------
1                    1           1
2                    1           1
3                    1           1
4                    1           1
1                    2           5
2                    2           5
1                    3           1
1                    4           6
2                    4           6

Suppose we want to get rid of all dups while preserving all unique rows. In other words, the end result is expected to have #t with one (1, 1) record, one (2, 5) record, one (3, 1) record, and one (4, 6) record,. The statement to do this can be like this:

with records (PartitionedNumber, a, b) as
(
        select
                row_number() over (partition by a order by a) PartitionedNumber, *
                from #t
)
        delete records where PartitionedNumber > 1;

The above will delete all dups preserving the unique records only.

more ▼

answered May 19, 2010 at 01:02 PM

Oleg gravatar image

Oleg
15.9k 2 4 24

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1840
x986
x4

asked: May 18, 2010 at 06:22 AM

Seen: 773 times

Last Updated: May 18, 2010 at 08:37 AM