question

technette avatar image
technette asked

Remove Duplicates

Hi! I have a cursor that iterates through a list of parts numbers and pulls the bill of material for each part then loads them into a table. **How do I remove duplicate parts from the finished set of data similar to the Remove Duplicates function in excel?** Select Distinct does not accomplish this because of other data.
duplicates
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
This site functions according to votes. For all helpful answers below, show this by clicking on the thumbs up next to those answers. If any one answer lead to a solution, indicate this by clicking on the check mark next to that answer.
1 Like 1 ·
CirqueDeSQLeil avatar image
CirqueDeSQLeil answered
If you are looking to remove the duplicates, that indicates a delete operation and this article will show how to do that. [ http://bit.ly/hdlf0M][1] If you are looking to just select a unique set of data from the table, the select statement will depend on the table structure and the data requirements for output. [1]: http://bit.ly/hdlf0M
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
First of all, I'd like to point out that you can probably (highly likely) rewrite your procedure to get rid of the cursor. It will speed up performance and you won't have to materialise whatever it is that you found before finally loading the data into a table. That is: Duplication handling can be performed before insert. Anyway: If you now have the data in a table, and your data contain duplicates in some way, you can use for example the ROW_NUMBER-funktion, partitioning it over whichever columns it is that should make a row unique with your bussiness rules, and have the outer query of the CTE perform a delete against the CTE. An example of it. If you have a table, dbo.Parts, containing - ID - integer with identity(1,1) - Partnumber - whatever datatype appropriate (not important in this example) - PartnumberDescription - some string datatype You have duplicates on partnumbers, and you want to keep the last inserted partnumber. That can be done with the following code: WITH CTE AS( SELECT *, ROW_NUMBER() OVER(PARTITION BY PartNumber ORDER BY ID DESC) as row_num FROM dbo.Parts ) DELETE FROM CTE WHERE row_num > 1; If you're not familiar with ROW_NUMBER and other window functions, the syntax might look a little awkward because of the OVER-clause. In short, it contains an OVER-clause, which defines how the data should be _partitioned_ and in which _order_ the rows should be numbered within each partition. In my example, the dataset returned by "SELECT ... FROM dbo.Parts" is split into partitions, one for each Partnumber. Within each of these partitions, the data is sorted by ID in descending order and a row-number is applied. So having three rows with the same Partnumber will set row-numbers 1, 2 and 3, where row-number 1 will have the highest value for ID, number 2 will have the second highest value for ID and number 3 will have the lowest value for ID, within the partition defined by just that Partnumber. Remember that any statement occuring before a CTE declaration must end with a semicolon. One convention widely used is to START the CTE declaration, just to be sure. Like this: ;WITH CTE AS .... Personally, I think that's ugly code. A semicolon should end a statement.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Patrick avatar image
Patrick answered
DELETE t FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY PartNumber ORDER BY ID DESC) as row_num FROM dbo.Parts ) t WHERE row_num > 1;
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.