Hi! I have a cursor that iterates through a list of parts numbers and pulls the bill of material for each part then loads them into a table. **How do I remove duplicate parts from the finished set of data similar to the Remove Duplicates function in excel?** Select Distinct does not accomplish this because of other data.
If you are looking to remove the duplicates, that indicates a delete operation and this article will show how to do that. [
http://bit.ly/hdlf0M] If you are looking to just select a unique set of data from the table, the select statement will depend on the table structure and the data requirements for output. :
First of all, I'd like to point out that you can probably (highly likely) rewrite your procedure to get rid of the cursor. It will speed up performance and you won't have to materialise whatever it is that you found before finally loading the data into a table. That is: Duplication handling can be performed before insert. Anyway: If you now have the data in a table, and your data contain duplicates in some way, you can use for example the ROW_NUMBER-funktion, partitioning it over whichever columns it is that should make a row unique with your bussiness rules, and have the outer query of the CTE perform a delete against the CTE. An example of it. If you have a table, dbo.Parts, containing - ID - integer with identity(1,1) - Partnumber - whatever datatype appropriate (not important in this example) - PartnumberDescription - some string datatype You have duplicates on partnumbers, and you want to keep the last inserted partnumber. That can be done with the following code: WITH CTE AS( SELECT *, ROW_NUMBER() OVER(PARTITION BY PartNumber ORDER BY ID DESC) as row_num FROM dbo.Parts ) DELETE FROM CTE WHERE row_num > 1; If you're not familiar with ROW_NUMBER and other window functions, the syntax might look a little awkward because of the OVER-clause. In short, it contains an OVER-clause, which defines how the data should be _partitioned_ and in which _order_ the rows should be numbered within each partition. In my example, the dataset returned by "SELECT ... FROM dbo.Parts" is split into partitions, one for each Partnumber. Within each of these partitions, the data is sorted by ID in descending order and a row-number is applied. So having three rows with the same Partnumber will set row-numbers 1, 2 and 3, where row-number 1 will have the highest value for ID, number 2 will have the second highest value for ID and number 3 will have the lowest value for ID, within the partition defined by just that Partnumber. Remember that any statement occuring before a CTE declaration must end with a semicolon. One convention widely used is to START the CTE declaration, just to be sure. Like this: ;WITH CTE AS .... Personally, I think that's ugly code. A semicolon should end a statement.