question

Fatherjack avatar image
Fatherjack asked

Clustered Unique Index vs PK

Hi,

What's the difference - in performance terms only - between a Primary Key and a Clustered Unique Index?

I have a table that has an ID column that the only column in a Clustered Unique Index. In other tables from the application supplier the ID column is a Primary Key but in one it is simply an index.

Does anyone know if there would be any benefit in changing this to a PK please?

There are no performance issues with the system (I know, if it isnt broke ... but I am interested in finding out the answer) and my missing indexes query doesnt complain that anything needs adding. The table has 8M rows.

cheers

Jonathan

sql-server-2005indexing
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

The real difference between a unique index and primary key comes into the disallowment of NULLs, when related to by an FK in another table.

Say you have tbl1, ID, clustered unique index, in one row ID is null.

Then you have tbl2, tbl1_ID, column that relates to tbl1.ID.

In tbl2, if tbl1_ID has a value of NULL - what would that mean? Would it mean that the row in tbl2 did not have a relation to tbl1? Or would it mean that the row in tbl2 was related to the row in tbl1 where ID was NULL?

For me, that's a key benefit of primary keys - the disambiguation of the above scenario.

With the clustered bit, that's a bit of a different story. An index, unique constraint or primary key constraint can all either be clustered or not, and only one of them can be clustered on a table at a time. So really, that's in effect a separate question.

4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jack Corbett avatar image Jack Corbett commented ·
Of course if the ID column is defined as NOT NULL then you have, in essence, a Primary Key when defining a UNIQUE INDEX on the column (doesn't have to be CLUSTERED). To specifically answer the question about performance, if the id column is defined as NOT NULL then you would not have any performance issues and there really is no reason to change it other than, in my opinion, defining it as a primary key can make it a bit clearer.
1 Like 1 ·
Jay Bonk avatar image Jay Bonk commented ·
Matt, Good point about the relationship between the NULL in the clustered index and the columns in the secondary table
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
+1 and not just because it's a worthwhile ramble!
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
Thoroughly agree... I just wanted to outline another reason why it *might* be a good idea. I can't see performance differing a jot, if the clustered element of the question is kept consistent...
0 Likes 0 ·
Blackhawk-17 avatar image
Blackhawk-17 answered

That lone table may be a staging table for the application. The reason it may not be a PK is that duplicates may be possible. Changing it to a PK could disallow duplicates and may break the system.

That being said... PKs are used for entity and referential integrity - generally good things :) That is why most tables should have a PK.

A PK is, in essence, a unique index that does not allow nulls. It is optionally clustered - sometimes good, sometimes not so much (i.e. GUID column), sometimes it is non-clustered. So it may not aide in performance either way.

Is there an existing clustered index on the table? Is it the ID column? Maybe a few more details about the table in question could help us help you.


I guess the coffee hadn't kicked in yet ;)

Re-reading everything... you basically have a PK that isn't identified in the system as such. There would be no performance improvements per se but a PK would aide in referential integrity if the column is a foreign key for another table.

3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
His Unique clustered index will also disallow duplicates.
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
apologies for any confusion. The index is clustered and unique and solely based on the ID column. The table is certainly not a staging table, it is core to the whole application. (As an additional bit of info, the index is called PK_ID. I am wondering if its a scripting error along the way somewhere and I want to know whether there is any need/benefit/requirement for me to change it).
0 Likes 0 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
Based on everything, it walks like a duck...
0 Likes 0 ·
Kev Riley avatar image
Kev Riley answered

PK will not allow nulls, where as the clustered unique will allow one - may or may not be an issue.

2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
Hi Kev, there isnt really an issue I just want to understand whether there is any logic in it being set this way. Would there be a performance difference if its set to a PK? Would locking,blocking,deadlocks be handled differently? Should it be set to a PK? !!
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
I +1'd this one, but also answered because my rambling wouldn't fit in a comment :)
0 Likes 0 ·
Bob Hovious avatar image
Bob Hovious answered

Just to add to the confusion...

A primary key constraint need not be supported by a clustered index. It is possible to have a multiple column clustered index, and a single column primary key constraint that is enforced by a nonclustered index. With respect to performance, a unique clustered index is a a unique clustered index, whether or not it is supporting a primary key constraint. Functionally, either will prevent duplicates. The primary key constraint will disallow the use of nulls, as others have stated.

Some people create tables then build indexes using the CREATE INDEX syntax, which does not produce a primary key constraint. Others use the CONSTRAINT syntax, which results in the production of an index.

Nothing is sacred about the letters 'PK' at the beginning of a column or index name. It is a naming convention only. For clarity, when I am deliberately creating a clustered index which is not the primary key, my convention is to use 'CLIX_' instead of 'PK_'.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Rob Farley avatar image
Rob Farley answered

I'm going to jump in... sorry everyone.

Let's consider the definitions.

A Primary Key is a unique index that doesn't allow NULL values. It's the one that's indicated using a Gold Key in Management Studio, and is admired by Data Source Views in BI Development Studio. But essentially, just think of it as a unique index.

A Clustered Index is the table itself, sorted into a b-tree by the key(s) in the Index definition. At the leaf level of the index you find all the other columns from the table. It must be unique, and if its key columns aren't known to be unique, it uses an internal uniquifier column to make this happen. The Clustered Index keys (plus its uniquifier if required) form the address of each row, and is thus referenced in every Non-Clustered Index on the table. If a table doesn't have (isn't?) a Clustered Index defined on it, the it's a Heap, and the address of each row comes down to its position in the file on disk.

The confusion between the two comes from the fact that by default, SQL uses the PK for the table's CIX, as it's often a good idea. A CIX on a PK can be declared as unique, and therefore avoiding the need for a uniquifier column.

People often consider that a PK is useful for Foreign Keys. But actually, a Foreign Key can reference any Unique Index, so that's not really a requirement. Unfortunately, you can't change a Unique Index to a Primary Key without recreating it (although I've requested this be changed in a Connect entry - if someone wants to provide the link in a comment, that'd be handy).

If you put a Clustered Index on a table that doesn't have one, or you change the Clustered Index, or anything like that, the whole table must be rebuilt, including any other indexes on it. This is because the Clustered Index is really the table itself (a table is either a CIX or a heap), and it's like asking all the houses in a street to be ordered a different way - you're changing the physical addresses of the rows.

So... I wouldn't change it. Make sure you understand the execution plans of queries that use the table. Check out the recording of my talk from SQLBits when it becomes available (I know you weren't there Father Jack, you were in Brent's session), so that you can look for some ideas on how to appreciate what's going on with your indexes, how they can be simplified out, and so on.

Ideally, any SELECT query on a table shouldn't have to touch the Clustered Index (or heap) at all - everything should be contained within covering non-clustered indexes. If you can avoid lookups, then that's great. But that's for another post.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.