Hi I had read somewhere that if we create a clustered index and store duplicate values in it SQL Server adds to every duplicate instance of a clustering key value a 4-byte integer value called a uniqueifier. This uniqueifier is added everywhere the clustering key is stored . consider below example Customer_id(Cluster Key) Unique identifier 1 null 1 1 1 2 2 null The uniqueifier is NULL for the first instance of each customer_id, and is then populated, in ascending order, for each subsequent row with the same customer_id value My question here is how it gonna help me with record identification using key because if I say select * from customer where customer_id=1 it will give me three records then what is the use of adding this uniqueidentifier with each clustered key
Well the logical concept is that you say that you want ALL records where Customerid = 1, not a unique record. If you need unique records in the table, you have to specify a unique identifier and ask for the unique value. If you need to remove duplicates, I'll give you a nice query: ;with cte AS ( SELECT rn = ROW_COUNT() OVER(ORDER BY someColumn), AnotherColumn FROM someTable ) DELETE FROM cte OUTPUT deleted.* WHERE rn >1 The uniquefier column is create to make the record unique, because the non-clustered index has to reference a unique value. > "If the clustered index was not > initially created as unique, SQL > Server internally appends a unique > four byte value to the end of each > index key. The unique four byte value > is required so that each nonclustered > index row contains a unique clustered > index key. For a non-unique clustered > index, this four byte value at the end > of the index key may change during the > index rebuild, and thus all of the > nonclustered index keys must also be > rebuilt. If the user did not specify > that the clustered index was unique, > the expected behavior during a reindex > is for all of the indices to be > rebuilt."
Because a clustered index stores the data for the table, in order to separate one row from another, there has to be a mechanism that uniquely identifies each row of data. If you have a non-unique clustered index, the additional column is added in the background (and becomes part of the internal key used by non-clustered indexes, but that's a secondary purpose to this) in order to make that key unique so that you can find each of the rows being stored. Otherwise, how could SQL Server know that any one row was completely separate from any other in order to maintain the data in a consistent state?