We are generating data in the warehouse. The input table is going to have about 50 million rows per daily partition, and a total of 17 partitions will be saved (rolling partition delete).
During one of the discussions around scalability (partitioning and clustered indexes), my architect said that he prefers to put an INT column instead of a datetime column that has only Date (no time). So the datetime column would be translated from
The AsOfDateId is what we have partitioned on and the DateColumnId is what most of the queries will run on.
What are the advantages of using a INT representation of a DateTime in the clustered index?
I would answer, It depends.
Using INT as a date has been a common practice for datawarehouse solutions for many years, but that might change now when we finally got the DATE column that is even smaller then INT and you can apply DATE functions on it.
On the other hand, it is hard to change a common practice and you might run into other problems with developers trying to filter on functions instead of using the date dimension.
And in SQL server 2012, you might consider using an even more denormalized datamodel with the new COLUMN STORE INDEX, but in any case, you should try different approaches to find whats best suited for your specific requirements.
Yesterday I held a presentation about column store index and how to use it and how you could get even faster queries for your datawarehouse. There are always a downside in every solution and you need to be aware of them and choose the right solution for you.
But if you go against the common practice, you will always be questioned :
If you do, make sure you are prepared by having the answers
answered Nov 10, 2011 at 11:44 PM
I am not in the favor of storing any datatype with different datatype. Since you do no require time part of the date, then why not use DATE datatype which storage structure is 3-byte integer (whereas INT has size of 4 bytes). This will save you 1 byte for every record in RAM as well as on physical storage. Since it is a warehouse, I believe there would be millions of records and you can save your precious memory and storage.
One thing with storing date as INT is that the Date dependency on LANGUAGE is not to be handled. But then you have the overhead of maintenance. There could be some advantages for storing it as INT, but I will definitely go with DATE datatype.
Anyways, you should discuss it with your architect, and do let us know if he comes up with better thoughts.
answered Nov 10, 2011 at 10:07 PM