x

Use of candidate key in place of composite foreign key

I have a master table with formulaID as primary key and a detail table with formulaid and RowNum as primary key.Can I replace formulaId and RowNum in the detail table with a candidate key of integer type to make join operations faster or I should go with the composite Key in detail table which is alphanumeric type?
more ▼

asked Jun 28, 2010 at 12:02 AM in Default

gvrathore gravatar image

gvrathore
13 2 2 2

(comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

EDIT

I have done some extremely basic timings on natural key vs surrogate key.

What I did was to create a Master/Detail relationship on a natural key column and one Master/Detail relationship on a surrogate key relationship. For the surrogate key relationship, I added a UNIQUE-constraint for the Master-table.

DDL for my tables below. Natural keys first:

CREATE TABLE [dbo].[NaturalMaster](
    [OrderID] [nvarchar](100) NOT NULL DEFAULT (CONVERT([nvarchar](100),newid(),0)),
    [dt] [datetime] NULL,
PRIMARY KEY CLUSTERED ([OrderID] ASC))

GO

CREATE TABLE [dbo].[NaturalDetail](
    [OrderID] [nvarchar](100) NOT NULL,
    [row_num] [int] NOT NULL,
    [dt] [datetime] NULL,
 CONSTRAINT [PK_NaturalDetail] PRIMARY KEY CLUSTERED 
([OrderID] ASC,[row_num] ASC))
GO
ALTER TABLE [dbo].[NaturalDetail]  WITH CHECK ADD  CONSTRAINT [FK_NaturalDetail_NaturalMaster] FOREIGN KEY([OrderID])
REFERENCES [dbo].[NaturalMaster] ([OrderID])
GO
ALTER TABLE [dbo].[NaturalDetail] CHECK CONSTRAINT [FK_NaturalDetail_NaturalMaster]

And then the Surrogate keys tables:

CREATE TABLE [dbo].[SurrogateMaster](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [OrderID] [nvarchar](100) NULL,
    [dt] [datetime] NULL,
PRIMARY KEY CLUSTERED 
(
    [ID] ASC
),
 CONSTRAINT [UQ_SurrogateMaster] UNIQUE NONCLUSTERED 
(
    [OrderID] ASC
))
GO
CREATE TABLE [dbo].[SurrogateDetail](
    [OrderID] [int] NOT NULL,
    [row_num] [int] NOT NULL,
    [dt] [datetime] NULL,
 CONSTRAINT [PK_SurrogateDetail] PRIMARY KEY CLUSTERED 
(
    [OrderID] ASC,
    [row_num] ASC
))
GO
ALTER TABLE [dbo].[SurrogateDetail]  WITH CHECK ADD  CONSTRAINT [FK_SurrogateDetail_SurrogateMaster] FOREIGN KEY([OrderID])
REFERENCES [dbo].[SurrogateMaster] ([ID])
GO
ALTER TABLE [dbo].[SurrogateDetail] CHECK CONSTRAINT [FK_SurrogateDetail_SurrogateMaster]

Then I add 5000 rows to the master table, and 10 detail rows per master row (with CURRENT_TIMESTAMP as value for dt-column). Finally I did a select on each of the relationsships:

select top 10000 m.orderid,d.row_num,d.dt
from naturalmaster m
inner join naturaldetail d on m.orderid=d.orderid
order by m.dt, d.row_num 

And

select top 10000 m.orderid,d.row_num,d.dt
from surrogatemaster m
inner join surrogatedetail d on m.id=d.orderid
order by m.dt, d.row_num 

Both these queries run in less than a second, and the time taken to run the queries is mainly transportation of data from the server to Management studio.

So at least in this extremely simple/simplified scenario, I can't find a performance difference. In you scenario, you seem to have two columns as primary key in the master table, and that might of course be a slight performance hit, as there are two columns to compare. But unless you have huge amounts of data, I doubt that's a performance hit you'll ever feel.

END EDIT

It sounds like you have a good database design. Changing it to using surrogate keys will make it less good. So unless you know that you'll gain a lot in performance I wouldn't recommend changing the databsae design. I don't know what the performance gain is in newer versions of SQL Server - haven't seen any test data recently, so I can't really tell how much you'll gain in terms of join-performance by using int instead of natural keys.

more ▼

answered Jun 28, 2010 at 05:52 AM

Magnus Ahlkvist gravatar image

Magnus Ahlkvist
16.6k 17 20 33

+1 Excellent answer Magnus! I just want to comment on the primary key in the surrogate sample. If you add a surrogate key and add use it as a clustered primary key you will waste the clustered index to something you probably never will search in ranges.
Jun 28, 2010 at 02:44 PM Håkan Winther

Hi Håkan,

I totally agree. I've been wondering since I started using SQL Server why it defaults the Primary Key to a clustered index (that's what happened here, I was too lazy to specify NON CLUSTERED and when I scripted out the table from SSMS it looks like I explicitly declared it CLUSTERED).

I wrote something about clustered indexes and range searching with a the classic phone book example a couple of years ago - in swedish, but from your name I take it you'll get it :) - http://www.underlandet.com/SqlServer/2008/04/18/VaddaringKlusterindex.aspx
Jun 28, 2010 at 02:55 PM Magnus Ahlkvist
(comments are locked)
10|1200 characters needed characters left

Hi,

I'm not sure I understand your issue correctly, so please correct me if I'm wrong:

  • You have a table with PK formulaID, which is of string datatype (varchar, char, nvarchar, nchar or similar).
  • You have a details table, with FK formulaID (also string datatype of course) and the PK of the details table is a compound key of formulaID,RowNumber

Correct?

And your question is: Would it improve performance if you replace formulaID with an integer-column?

If that's the question, I'd say no, stick to the setup you have. You might gain a little on the joins, but on the other hand, if I understand correctly, the formulaID actually means something and it's a unique value in the master table. That means you still needs to keep that column in the master table, and you need to enforce the unique values using a unique constraint. To me, that's poor database design. If there's already a unique value in a table, it should be the primary key of the table.

Also: If you use a natural key, you don't have to query via the master-table if you want to look at a specific row for a specific formula, which will save you some performance.

But again - I'm not sure I understand your question 100%. It would be easier to understand if you posted your DDL for the tables.
more ▼

answered Jun 28, 2010 at 12:20 AM

Magnus Ahlkvist gravatar image

Magnus Ahlkvist
16.6k 17 20 33

(comments are locked)
10|1200 characters needed characters left

Hi Magnus,

It is suggested that joins on alphanumeric composite keys is much slower than joins on single numeric field. If I can put a numeric Document Number field on both Master and Detail tables and rather than working on composite key on both the tables I can use single candidate key for joins. For example I have Master table with FormulaId and RevNo, both alphanumeric fields in Master table as Composite Key and I have Detail table with FormulaId, RevNo and RowNum as CompositeKey. Now my question is can I add DocNum field which is integer type in Master Table and replace both formulaId and RevNo in Detail table with DocNum which is again of integer type. Now I can map DocNum in Master and Detail table in joins instead of mapping formulaId and RevNo in both Master and Detail tables. Also I will make DocNum in Master table as single Candidate Key instead of FormulaId and RevNo and DocNum and RowNum in Detail table as Composite Key. Is this will be a good database design because it will make joins faster but again I will have to compromise with finding formulaId in Detail table as it will require another join to Master table of DocNum to find the corresponding formulaId?

more ▼

answered Jun 28, 2010 at 05:38 AM

gvrathore gravatar image

gvrathore
13 2 2 2

(comments are locked)
10|1200 characters needed characters left

Hi Magnus,

Thanks for your suggestion. But still I am not able make up that which design will be more accurate either to have current design with Composite key or new design with Candidate Key because I have seen databases with my new design approach because sorting, indexing and other dml and ddl operations are much faster with Candidate numeric Key than with Composite alphanumeric Key. Could you please put some light on this issue?

more ▼

answered Jun 28, 2010 at 06:39 AM

gvrathore gravatar image

gvrathore
13 2 2 2

I've edited my answer, with a very simple example. And to add to that answer: Inserting 5000 master rows and 10 detail rows per master row was done in more or less the exact same time for the two approaches (five minutes each, looping with an inner and an outer counter for details/master).
Jun 28, 2010 at 06:42 AM Magnus Ahlkvist
(comments are locked)
10|1200 characters needed characters left

Hi Magnus,

Thanks a lot for taking so much pain in solving the issue. I think there should not be any issue with this solution, although let me check the performance at my end with.
more ▼

answered Jun 28, 2010 at 08:20 AM

gvrathore gravatar image

gvrathore
13 2 2 2

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x346

asked: Jun 28, 2010 at 12:02 AM

Seen: 3524 times

Last Updated: Jun 28, 2010 at 12:02 AM