Can any one help me in getting duplicates in a table from every column
Can any one help me in getting duplicates in a table from every column
This is a very vague question that could get a really complicated solution. If you simply want to locate a duplicate row then you will need to use something like:
USE [adventureworks]
GO
CREATE TABLE Myduplicates
(
IDCol INT IDENTITY,
ColA varchar(20),
ColB VARCHAR(10),
ColC int
)
GO
INSERT INTO [dbo].[Myduplicates] ( [ColA], [ColB], [ColC] )
SELECT 'Larry', -- ColA - varchar(20)
'Curly', -- ColB - varchar(10)
10 -- ColC - int
UNION
SELECT 'Larry', -- ColA - varchar(20)
'Moe', -- ColB - varchar(10)
20 -- ColC - int
UNION
SELECT 'Curly', -- ColA - varchar(20)
'Larry', -- ColB - varchar(10)
30 -- ColC - int
UNION
SELECT 'Moe', -- ColA - varchar(20)
'Curly', -- ColB - varchar(10)
10 -- ColC - int
UNION
SELECT 'Zeppo', -- ColA - varchar(20)
'Harpo', -- ColB - varchar(10)
10 -- ColC - int
UNION
SELECT 'Chico', -- ColA - varchar(20)
'Zeppo', -- ColB - varchar(10)
30 -- ColC - int
UNION
SELECT 'Groucho', -- ColA - varchar(20)
'Zeppo', -- ColB - varchar(10)
20
-- ColC - int
go
SELECT [m].[ColA],
COUNT(idcol) AS [duplicate count]
FROM [dbo].[Myduplicates] AS m
GROUP BY [m].[ColA]
having COUNT(idcol) > 1
ORDER BY [duplicate count] DESC ;
SELECT [m].[Colb],
COUNT(idcol) AS [duplicate count]
FROM [dbo].[Myduplicates] AS m
GROUP BY [m].[Colb]
having COUNT(idcol) > 1
ORDER BY [duplicate count] DESC ;
SELECT [m].[Colc],
COUNT(idcol) AS [duplicate count]
FROM [dbo].[Myduplicates] AS m
GROUP BY [m].[Colc]
having COUNT(idcol) > 1
ORDER BY [duplicate count] DESC ;
SELECT [m].[IDCol] AS [IDs that need review for ColA duplicates]
FROM [dbo].[Myduplicates] AS m
INNER JOIN ( SELECT [m].[ColA],
COUNT(idcol) AS [duplicate count]
FROM [dbo].[Myduplicates] AS m
GROUP BY [m].[ColA]
having COUNT(idcol) > 1
) AS s1 ON [m].[ColA] = [s1].[ColA];
SELECT [m].[IDCol] AS [IDs that need review for ColB duplicates]
FROM [dbo].[Myduplicates] AS m
INNER JOIN ( SELECT [m].[Colb],
COUNT(idcol) AS [duplicate count]
FROM [dbo].[Myduplicates] AS m
GROUP BY [m].[Colb]
having COUNT(idcol) > 1
) AS s1 ON [m].[Colb] = [s1].[Colb];
go
DROP TABLE Myduplicates
resolving the duplicates will be a whole new piece of work
There are a number of ways to solve this using TSQL. The best these days seem to revolve around using ROW_NUMBER(). The key is to simply understand the basic concept that you need a method to uniquely identify the row. Then you need a way to mark duplicate values for that unique identifier and then you need a mechanism to remove those duplicates. While this sounds like three steps, you should be able to do all this in a single query.
I hope that I understand the question correctly. The task is to find the duplicate records across all columns in the table. I will also provide the sample of how to quickly delete all such duplicates. Lets create a heap table and insert some records in it (including some duplicates:
create table #t (a int, b int);
go
insert into #t values (1, 1);
insert into #t values (1, 1);
insert into #t values (1, 1);
insert into #t values (1, 1);
insert into #t values (2, 5);
insert into #t values (2, 5);
insert into #t values (3, 1);
insert into #t values (4, 6);
insert into #t values (4, 6);
go
Now we have 4 occurences of (1, 1); 2 occurences of (2, 5); (3, 1) does not have any duplicates and we also have 2 occurences of (4, 6). Here is the script to quickly identify all the duplicates:
select
row_number() over (partition by a order by a) PartitionedNumber, *
from #t;
Here is the result of the query above:
PartitionedNumber a b
-------------------- ----------- -----------
1 1 1
2 1 1
3 1 1
4 1 1
1 2 5
2 2 5
1 3 1
1 4 6
2 4 6
Suppose we want to get rid of all dups while preserving all unique rows. In other words, the end result is expected to have #t with one (1, 1) record, one (2, 5) record, one (3, 1) record, and one (4, 6) record,. The statement to do this can be like this:
with records (PartitionedNumber, a, b) as
(
select
row_number() over (partition by a order by a) PartitionedNumber, *
from #t
)
delete records where PartitionedNumber > 1;
The above will delete all dups preserving the unique records only.
No one has followed this question yet.