We have a CalendarTimeUTC Dimension table in our Data Warehouse that looks like this:
The PK in the table is the CalendarTimeUTCId (clustered). This used to be a Int field. All fact tables have CalendarId in them (some of these are multi-billion row partitioned tables).
We want to go from putting a DATE value to DATE+HOUR value in this field.
Now with the advent of SQL 2008, and the shiny new implementation of DATE, is there a reason to switch the dimension from INT to DATETIME?
How will it affect the Index Size in the fact tables? More importantly, how will it affect performance?
An int is 4 bytes and a datetime is 8, so switching int to datetime will be a 4-byte increase for each CalendarId reference you change. This will increase the size of any index which includes that datetime key, as well as the table storing the datetime foreign key itself.
On the other hand, if you take a standard int and append the hours to the end, you might not be pleased with the results; 9999-12-31 at 11 PM translates to 9,999,123,123, which would overflow an int field. That would be bigint territory, which is itself 8 bytes.
The Kimball folks say that you should use a surrogate key even for a date dimension because you might not know the date, so you can put in a reasonable "I do not know" key like -1. If you use a datetime, you lose that luxury and either put in a fake datetime or have null values. That's why I would personally recommend that, if you need to do this, go with a bigint before putting the actual datetime on your fact table.
Another alternative, and something I would really personally recommend, could be to keep hour separate, as that means your date dimension won't balloon up. If you have 100 years of dates, you'll have roughly 36,500 days. But if you include hours, that jumps to roughly 876,000 records. Instead, if you have an Hours dimension, you could have a 4-byte Date dimension int surrogate key and a 1-byte Hour dimension tinyint surrogate key. Even if you need to go down to the second, have a Time dimension instead with 3600 total records and a smallint surrogate key. Then, Date+Time = 6 bytes, as opposed to the 8-byte bigint for Date+Hour, and you have growth of N + 24 (or N + 3600) as opposed to N * 24 (or, egads, N * 3600) records, where N is the number of days in your Date dimension.