Is the new date better than the old date?

We have a CalendarTimeUTC Dimension table in our Data Warehouse that looks like this:

alt text

Sample data:

alt text

The PK in the table is the CalendarTimeUTCId (clustered). This used to be a Int field. All fact tables have CalendarId in them (some of these are multi-billion row partitioned tables).

We want to go from putting a DATE value to DATE+HOUR value in this field.

Now with the advent of SQL 2008, and the shiny new implementation of DATE, is there a reason to switch the dimension from INT to DATETIME?

How will it affect the Index Size in the fact tables? More importantly, how will it affect performance?

more ▼

asked May 31, 2011 at 04:58 AM in Default

avatar image

Raj More
1.8k 83 89 90

Can you give some examples of the INT values you store?

May 31, 2011 at 05:27 AM Blackhawk-17

@Blackhawk-17: Data structure & Sample provided

May 31, 2011 at 05:42 AM Raj More
(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

An int is 4 bytes and a datetime is 8, so switching int to datetime will be a 4-byte increase for each CalendarId reference you change. This will increase the size of any index which includes that datetime key, as well as the table storing the datetime foreign key itself.

On the other hand, if you take a standard int and append the hours to the end, you might not be pleased with the results; 9999-12-31 at 11 PM translates to 9,999,123,123, which would overflow an int field. That would be bigint territory, which is itself 8 bytes.

The Kimball folks say that you should use a surrogate key even for a date dimension because you might not know the date, so you can put in a reasonable "I do not know" key like -1. If you use a datetime, you lose that luxury and either put in a fake datetime or have null values. That's why I would personally recommend that, if you need to do this, go with a bigint before putting the actual datetime on your fact table.

Another alternative, and something I would really personally recommend, could be to keep hour separate, as that means your date dimension won't balloon up. If you have 100 years of dates, you'll have roughly 36,500 days. But if you include hours, that jumps to roughly 876,000 records. Instead, if you have an Hours dimension, you could have a 4-byte Date dimension int surrogate key and a 1-byte Hour dimension tinyint surrogate key. Even if you need to go down to the second, have a Time dimension instead with 3600 total records and a smallint surrogate key. Then, Date+Time = 6 bytes, as opposed to the 8-byte bigint for Date+Hour, and you have growth of N + 24 (or N + 3600) as opposed to N 24 (or, egads, N 3600) records, where N is the number of days in your Date dimension.

more ▼

answered May 31, 2011 at 09:56 AM

avatar image

Kevin Feasel
6.2k 4 8 15

When you're getting that granular you could look at a date data type which is only 3 bytes and save 25% per record/index entry.

As an aside... datetime2 can be 6 bytes of storage...

@Kevin Feasel - a "-1" and a "fake" date are pretty well the same end result: a known unknown.

May 31, 2011 at 10:26 AM Blackhawk-17

True on all three counts. I'd say the defense for #3 is that -1 is obviously fake, and can also be your "unknown" surrogate key for all dimensions so you have a consistent "unknown" record. In contrast, maybe something did happen on 1/1/1900...or was it in 1753? Or 9999? That's why, even with the date type being 3 bytes, I'd still go with an int and a -1 "unknown" record. It's one case where the extra byte (and its few billion friends on the fact table) could still spare some grief and make it easier on report writers.

May 31, 2011 at 10:33 AM Kevin Feasel

I totally agree with @Kevin to go by using separate dimensions (Date and Time). You will save storage and have a greater possibilities. And as Kevin mentioned Kimball, they are smart enough and have experiences with such implementations and they know what they propose.

May 31, 2011 at 11:14 AM Pavel Pawlowski
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: May 31, 2011 at 04:58 AM

Seen: 1617 times

Last Updated: May 31, 2011 at 05:40 AM

Copyright 2018 Redgate Software. Privacy Policy