overly normalized data and optimization

hi all,

i am currently developing in a database of people. currently the records throughout the db are keyed as nchar types. the question i have is would my queries perform better (with all the joins) if the key fields were all int fields?


more ▼

asked Aug 22, 2010 at 04:31 AM in Default

avatar image

79 5 5 10

i should also mention i was using indexed views at one stage to improve performance, but i had to remove them due to the way data was pushed into the db in mass. the merge process was hanging and we believed this was a result of the server having to work to hard to rebuild the indexes on insert (or update not sure which).

Aug 22, 2010 at 05:09 AM DavidD
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

First, yes ints do compare faster than large datatypes, however the difference is normally small, especially when the larger datatype and the dataset in question is not excessively large. In other words, there is a performance benefit to switching to the smallest datatype possible, but it is normally outweighed by the fact you are carrying extra data to permit you to do that.

Next, have you considered using indexed views? They can increase the performance of complex joins dramatically as well as hide the underlying complexity of the tables from developers. There is some information on this at http://www.sqlservercentral.com/articles/Indexed+Views/63963/ .

Finally, this is a matter of some debate, but I think it is hard to normalize too much. It is however possible to do it incorrectly and that will cause problems, but I generally recommend going to 3NF and for certain situations 4NF or 5NF all make perfect sense.

more ▼

answered Aug 22, 2010 at 12:05 PM

avatar image

15.6k 22 55 38

thanks timothy, i did originally use indexed views however had to remove them. check my comments above. perhaps you can suggest to me why sql went through the roof when pumping new data in? i assumed it was due to sql having to write the physical index for the view while our import procedure was pulling in data... thoughts on how i can reimplement the indexed view without having it fall over when data is being imported?

Aug 22, 2010 at 05:00 PM DavidD

@Aaron - Have you thought about droppong the view before loading and recreating it after. This would remove the performance hit during bulk load, but still give you the performance boost of the indexed view.

Aug 23, 2010 at 12:27 AM WilliamD

@Aaron, for truly large inserts WilliamD makes a very good point. It often makes sense to drop indexes and even (in some specific cases) constraints and readd them afterwards. This generally only applies to very large inserts though.

I have used indexed views extensively, and like all indexes they do cause a performance drop on write operations. Normally this is very small and acceptable though. If it is not you may want to look first and make sure that you are not indexing more than you need to (indexes with an extreme number of columns or extremely large columns will cause more of a performance problem then those on just small keys). Also, check your insert statements themselves to make sure they are written in an effecient fashion. Finally, while I always recommend hardware upgrades only as a last resort for performance problems, do make sure your hardware is up to the task of handling the demands you are making of it.

Aug 23, 2010 at 09:27 AM TimothyAWiseman

This article talks about improper data splitting. I found it after writing my original response: http://www.simple-talk.com/content/article.aspx?article=1117 Perhapse something like that is what you mean by overnormalization?

Aug 23, 2010 at 11:42 AM TimothyAWiseman

how large an insert is large? originally we the view had i think 12 joins, and wee were updating/inserting 800 records at a time into each using sql 2008's new merge function. at the time i thought it was such a small insert that it couldnt affect performance but removing it made such a difference that i conceded it must have been the cause. i will investigate dropping the index and recreating it and look at optimizing the merge proc.

thanks again all, awesome feedback

Aug 23, 2010 at 05:44 PM DavidD
(comments are locked)
10|1200 characters needed characters left

Unless you're looking at an instance where the address field has been normalized (one table for numbers, another for streets, that sort of thing), it's pretty unlikely that you're looking at an "overly" normalized system. Normalization is a process of defining the necessary information to meet the business requirements. For example, email addresses. It's very easy to say that each person gets one email address, but then, what happens when a person has two. So, you split email off & put it in a seperate table. Then you find that multiple people share an email address. Now things get hard. You can support duplicate values in the email table, or, you need to look at building an interim table to support associating a given email with more than one person and keep the email addresses unique (as they probably should be). We've just gone from one table to three. Now toss in phone numbers and addresses and you're suddenly looking at 10 or more tables to bring back basic information on an individual. But, I wouldn't call that overly normalized. I'd say it's normalized enough to support the data.

As to the question on integers being better choices, yes, they are, for the reasons @Timothy said. Queries will perform better on smaller data types. The general rule is move only the data you need to move when you need to move it and store it pretty much the same way. Only make the values big enough to store what's needed. It basically comes down to the more data you can get onto a page, and that's either an index page or a data page, the better your performance will be because SQL Server will need to access fewer pages to return the data needed.

more ▼

answered Aug 23, 2010 at 04:53 AM

avatar image

Grant Fritchey ♦♦
137k 20 46 81

thanks grant. yes i agree, and im not suggeesting normalization is bad but in this case there are tables that for example has a person and a text item associated with the person (amongst several other association tables) and looking at the business logic in the splitting of tables i cant make any justification for the seperation. ie one person would only ever have one text item assigned to them, therefore i dont see the need to split them. nevertheless i take your point and will keep that in mind.

Aug 23, 2010 at 05:41 PM DavidD

Good point. It's not unheard of for people building databases to make mistakes.

Aug 24, 2010 at 04:44 AM Grant Fritchey ♦♦
(comments are locked)
10|1200 characters needed characters left

Are your tables Clustered? (i.e. do they have a clustered index?) If no, you could try this.

Are you maintaining your indexes regularly? If no, you could try this.

Are you shrinking and growing your database more than once a month? If yes, don't.

more ▼

answered Aug 24, 2010 at 04:52 AM

avatar image

Henrik Staun Poulsen
589 14 17 20

yes, yes and definately not.

the db as a whole performs quite well, its just the some of the major queries that need some work.


Aug 24, 2010 at 04:05 PM DavidD
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Aug 22, 2010 at 04:31 AM

Seen: 1756 times

Last Updated: Sep 29, 2010 at 09:45 PM

Copyright 2018 Redgate Software. Privacy Policy