question

jhowe avatar image
jhowe asked

remove duplicates, reassociate records

Hi all please consider the following code pasted below. What i am trying to do is remove the duplicate records from the customer table keeping only the most recent record for the customer and then reassociate all the service requests with the customer id i have kept. I'm not sure how to do this, any help would be greatly appreciated... so i would end up basically with 3 customer records and 10 service requests associated with each customer... CREATE TABLE [dbo].[customer] ( [customerid] [int] IDENTITY(1, 1) NOT NULL , [membershipnumber] [nvarchar](50) NOT NULL , [firstname] [nvarchar](50) NULL , [lastname] [nvarchar](50) NULL , PRIMARY KEY CLUSTERED ( [customerid] ASC ) ) SET IDENTITY_INSERT dbo.customer ON INSERT INTO [dbo].[customer] ( [customerid] , [membershipnumber] , [firstname] , [lastname] ) SELECT 1 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 2 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 3 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 4 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 5 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 6 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 7 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 8 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 9 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 10 , '111111' , N'brad' , N'pitt' UNION ALL SELECT 11 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 12 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 13 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 14 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 15 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 16 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 17 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 18 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 19 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 20 , '222222' , N'sylvester' , N'stallone' UNION ALL SELECT 21 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 22 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 23 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 24 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 25 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 26 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 27 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 28 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 29 , '333333' , N'bruce' , N'willis' UNION ALL SELECT 30 , '333333' , N'bruce' , N'willis' SET IDENTITY_INSERT dbo.customer OFF CREATE TABLE [dbo].[servicerequest] ( [servicerequestID] [int] IDENTITY(1, 1) NOT NULL , [customerid] [int] NULL , [requesttype] [nvarchar](50) NULL , [requestdescription] [nvarchar](MAX) NULL , PRIMARY KEY CLUSTERED ( [servicerequestID] ASC ) ) SET IDENTITY_INSERT dbo.servicerequest ON INSERT INTO [dbo].[servicerequest] ( [servicerequestID] , [customerid] , [requesttype] , [requestdescription] ) SELECT 1 , 1 , N'Holiday' , N'Eire' UNION ALL SELECT 2 , 2 , N'Holiday' , N'Bermuda' UNION ALL SELECT 3 , 3 , N'Holiday' , N'Kuwait' UNION ALL SELECT 4 , 4 , N'Travel' , N'Samoa' UNION ALL SELECT 5 , 5 , N'Travel' , N'Niger' UNION ALL SELECT 6 , 6 , N'Holiday' , N'Jamaica' UNION ALL SELECT 7 , 7 , N'Holiday' , N'Guatemala' UNION ALL SELECT 8 , 8 , N'Travel' , N'United States' UNION ALL SELECT 9 , 9 , N'Holiday' , N'Belize' UNION ALL SELECT 10 , 10 , N'Travel' , N'New Caledonia' UNION ALL SELECT 11 , 11 , N'Holiday' , N'Eire' UNION ALL SELECT 12 , 12 , N'Holiday' , N'Bermuda' UNION ALL SELECT 13 , 13 , N'Holiday' , N'Kuwait' UNION ALL SELECT 14 , 14 , N'Travel' , N'Samoa' UNION ALL SELECT 15 , 15 , N'Travel' , N'Niger' UNION ALL SELECT 16 , 16 , N'Holiday' , N'Jamaica' UNION ALL SELECT 17 , 17 , N'Holiday' , N'Guatemala' UNION ALL SELECT 18 , 18 , N'Travel' , N'United States' UNION ALL SELECT 19 , 19 , N'Holiday' , N'Belize' UNION ALL SELECT 20 , 20 , N'Travel' , N'New Caledonia' UNION ALL SELECT 21 , 21 , N'Holiday' , N'Eire' UNION ALL SELECT 22 , 22 , N'Holiday' , N'Bermuda' UNION ALL SELECT 23 , 23 , N'Holiday' , N'Kuwait' UNION ALL SELECT 24 , 24 , N'Travel' , N'Samoa' UNION ALL SELECT 25 , 25 , N'Travel' , N'Niger' UNION ALL SELECT 26 , 26 , N'Holiday' , N'Jamaica' UNION ALL SELECT 27 , 27 , N'Holiday' , N'Guatemala' UNION ALL SELECT 28 , 28 , N'Travel' , N'United States' UNION ALL SELECT 29 , 29 , N'Holiday' , N'Belize' UNION ALL SELECT 30 , 30 , N'Travel' , N'New Caledonia' SET IDENTITY_INSERT dbo.servicerequest OFF
sql-server-2008sql-server-2008-r2
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Dave_Green avatar image
Dave_Green answered
Well, you want to first identify the member's most recent (I assume highest) customerID in each case. That can be done with a simple groupby: SELECT membershipnumber,MAX(customerid) FROM customer GROUP BY membershipnumber We then want to transfer the servicerequest entries to the new customers. I use a CTE of the previous query to do this, and join the servicerequests to it via the customer table: ;WITH newnumber( membershipnumber,newnumber) AS (SELECT membershipnumber,MAX(customerid) FROM customer GROUP BY membershipnumber) UPDATE servicerequest SET customerID = newnumber.newnumber FROM servicerequest INNER JOIN Customer ON servicerequest.customerID = customer.customerid INNER JOIN newnumber ON customer.membershipnumber=newnumber.membershipnumber Then we can delete the customers which aren't the latest for that customer (again, using the same CTE): ;WITH newnumber( membershipnumber,newnumber) AS (SELECT membershipnumber,MAX(customerid) FROM customer GROUP BY membershipnumber) DELETE customer FROM customer INNER JOIN newnumber ON customer.membershipnumber=newnumber.membershipnumber AND customer.customerid != newnumber.newnumber /* customer row isn't the new one */ This gives the required results; 3 customers (IDs 10,20,30) and 10 requests linked to each ID. Just a heads up to anyone reading this in the future - You will need to add some SET IDENTITY INSERT lines to allow the code in the question to work properly. There are also a couple of typos (one in the column name in the declaration of customer, one in the second union statement for the service request insert).
10 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Dave_Green avatar image Dave_Green ♦ commented ·
Hi Jhowe; I'll give it a go! To answer your questions in turn - you need the 2nd and 3rd queries to modify your data. For the update, we take the servicerequest table, joining it with it's existing customer id number to the cte newnumber, so that you have effectively a list of Servicerequestid, oldnumber, membershipid, newnumber. We then simply update the servicerequest table so that the oldnumber id replaced by the newnumber. For the delete I'm taking the same CTE (newnumber), and removing any entry in the customers table where there is an entry in the CTE for that membership number, that isn't for the same ID. Does that help?
3 Likes 3 ·
jhowe avatar image jhowe commented ·
hi dave thanks for that i will give this a go. I have corrected the errors, thanks for spotting...
0 Likes 0 ·
jhowe avatar image jhowe commented ·
hi dave can you run this all in one batch? or does the last cte replace the second one?
0 Likes 0 ·
jhowe avatar image jhowe commented ·
hi dave. I can confirm i am getting the expected results and i understand the concept of comparing tables against each other to remove or add rows. i don't really understand HOW it's working though. Could you explain in layman's terms each bit? Thanks very much for your help! I obviously want to get this right as it will have a huge impact on the actual system if it's not correct so i need to understand exactly what it's doing...
0 Likes 0 ·
jhowe avatar image jhowe commented ·
ok i think i get it now. You are taking the highest customerid and assigning that as newnumber. You are then updating the current SR customerID with newnumber and then just deleting the duplicates that are not equal to the new number. Quite simple and elegant once you understand it! Now i just have to think whether there are any drawbacks to using this method in the actual database... if you know of any pitfalls off the top of your head that would help...
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.