|
The default in Management Studio is to make the primary key clustered, but is this a good idea? What are the pros and cons of making the primary key clustered v nonclustered?
(comments are locked)
|
|
The classic DBA answer - it depends. Strictly speaking every table should have both a PK and a clustered index, but not always necessarily using the same field. If you are using GUIDs (shudder) as your PK, then most definitely no! The fragementation can become horrendous. If using identity, then combining the PK and clustered won't do you much harm. If using some other natural/surrogate key then you will have to make an informed decision depending on the values in those keys, if they are as random as GUIDs, then you'll have the same issues as GUIDs.
(comments are locked)
|
|
The two are really different concepts - the primary key is the data that uniquely identifies a record, whereas a clustered index should really match the keys with which the record is most frequently accessed. A typical example might be transactions and items. Each table might have a primary key which is an IDENTITY field, but the transaction items would most likely be looked up by the Transaction ID (the foreign key from the items table to the transactions table). In that case, it would be most efficient to have a clustered index on the items table on the Transaction ID field, as that is how the items would be retrieved most of the time.
(comments are locked)
|
|
Yes. A Primary key is merely a unique key that is intended to be the default one used to select rows and perform joins. It is the default candidate key for a table. They should be used as there is no other way to specify to other processes which key should be used by default. A Primary key is no more than an indexed key whose value, taken together across all component columns, is unique and known. SQL Server creates a clustered unique index when you specify a primary key, but this is not always the best solution, since a clustered index works best when it is not unique, not null and when the values are reasonably evenly distributed The clustered index can be explained by my poem... An index is used as a short-cut to data a table will warrant one sooner or later Because only one can be clustered, beware and ponder the index you cluster with care the issues are clearer than you might suppose this index determines the order of rows so searching the index requires less I/O. Selecting the column on which it should go depends on the way that the rows are selected, which should become clear if the Schema's inspected. One problem, however, I think you should know, retrieving a range can be horribly slow. A non-clustered index is almost as good once ordering keys can be well understood make sure that the columns you use are selective for if too few values, it's most ineffective if data is changing or updating too with frequent insertions, keep indexes few. from 2000 on you can index a view (but then there's restrictions on what you can do) and even on computed columns as well but only if deterministic as hell For reasons which often are misunderstood a non-clustered covering index is good when composite columns are used with some care they outperform anything else that's out there LASHIWAB - love the rhyme Phil
Oct 09 '09 at 02:15 PM
Matt Whitfield ♦♦
(comments are locked)
|
|
I'd mildly disagree with Matt's answer. He's not necessarily wrong, but with the items table putting the clustered index on TransactionID might cause fragmentation because you can't be guaranteed that the items will be inserted in order of TransactionID meaning there will be more page splits. Also a non-clustered index on TransactionID in the items table would suffice for joining to the Transactions table. I should have probably mentioned using a fill factor to overcome that - very true!
Oct 09 '09 at 06:59 PM
Matt Whitfield ♦♦
(comments are locked)
|
|
Most of the time, yes. Since the cluster is the primary avenue of data access inside SQL Server and in most cases, when you're pulling from a particular table you're going first to the primary key, it makes the most sense to have that as the cluster as well. However, other circumstances can lead to different conclusions. If, for example, the most frequently access path, the column(s) used most often to retrieve data from the table was a foreign key, to eliminate the key lookups necessary to retrieve the data from the table, it might make more sense to make the foreign key the cluster. In my opinion, data access paths define where the cluster should be.
(comments are locked)
|
1 2 next page »

