The default in Management Studio is to make the primary key clustered, but is this a good idea? What are the pros and cons of making the primary key clustered v nonclustered?
The classic DBA answer - it depends.
Strictly speaking every table should have both a PK and a clustered index, but not always necessarily using the same field.
If you are using GUIDs (shudder) as your PK, then most definitely no! The fragementation can become horrendous.
If using identity, then combining the PK and clustered won't do you much harm.
If using some other natural/surrogate key then you will have to make an informed decision depending on the values in those keys, if they are as random as GUIDs, then you'll have the same issues as GUIDs.
answered Oct 11, 2009 at 06:41 AM
Kev Riley ♦♦
The two are really different concepts - the primary key is the data that uniquely identifies a record, whereas a clustered index should really match the keys with which the record is most frequently accessed. A typical example might be transactions and items. Each table might have a primary key which is an IDENTITY field, but the transaction items would most likely be looked up by the Transaction ID (the foreign key from the items table to the transactions table).
In that case, it would be most efficient to have a clustered index on the items table on the Transaction ID field, as that is how the items would be retrieved most of the time.
answered Oct 09, 2009 at 12:15 PM
Matt Whitfield ♦♦
Yes. A Primary key is merely a unique key that is intended to be the default one used to select rows and perform joins. It is the default candidate key for a table. They should be used as there is no other way to specify to other processes which key should be used by default.
A Primary key is no more than an indexed key whose value, taken together across all component columns, is unique and known. SQL Server creates a clustered unique index when you specify a primary key, but this is not always the best solution, since a clustered index works best when it is not unique, not null and when the values are reasonably evenly distributed The clustered index can be explained by my poem...
An index is used as a short-cut to data
a table will warrant one sooner or later
Because only one can be clustered, beware
and ponder the index you cluster with care
the issues are clearer than you might suppose
this index determines the order of rows
so searching the index requires less I/O.
Selecting the column on which it should go
depends on the way that the rows are selected,
which should become clear if the Schema's inspected.
One problem, however, I think you should know,
retrieving a range can be horribly slow.
A non-clustered index is almost as good
once ordering keys can be well understood
make sure that the columns you use are selective
for if too few values, it's most ineffective
if data is changing or updating too
with frequent insertions, keep indexes few.
from 2000 on you can index a view
(but then there's restrictions on what you can do)
and even on computed columns as well
but only if deterministic as hell
For reasons which often are misunderstood
a non-clustered covering index is good
when composite columns are used with some care
they outperform anything else that's out there
answered Oct 09, 2009 at 02:01 PM
I'd mildly disagree with Matt's answer. He's not necessarily wrong, but with the items table putting the clustered index on TransactionID might cause fragmentation because you can't be guaranteed that the items will be inserted in order of TransactionID meaning there will be more page splits.
Also a non-clustered index on TransactionID in the items table would suffice for joining to the Transactions table.
answered Oct 09, 2009 at 06:15 PM
Most of the time, yes. Since the cluster is the primary avenue of data access inside SQL Server and in most cases, when you're pulling from a particular table you're going first to the primary key, it makes the most sense to have that as the cluster as well. However, other circumstances can lead to different conclusions. If, for example, the most frequently access path, the column(s) used most often to retrieve data from the table was a foreign key, to eliminate the key lookups necessary to retrieve the data from the table, it might make more sense to make the foreign key the cluster.
In my opinion, data access paths define where the cluster should be.
answered Oct 10, 2009 at 08:22 AM
Grant Fritchey ♦♦