question

Raj More avatar image
Raj More asked

Warehouse Generation Table Clustered Index

We are generating data in the warehouse. The input table is going to have about 50 million rows per daily partition, and a total of 17 partitions will be saved (rolling partition delete). During one of the discussions around scalability (partitioning and clustered indexes), my architect said that he prefers to put an INT column instead of a datetime column that has only Date (no time). So the datetime column would be translated from `1/1/2011` to `20110101`. `Create Clustered Index IdxPDVX24Partition (AsOfDateId, DateColumnId)` The AsOfDateId is what we have partitioned on and the DateColumnId is what most of the queries will run on. What are the advantages of using a INT representation of a DateTime in the clustered index?
sql-server-2008-r2data-warehouseclustered-indexvldb
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered
I would answer, It depends. Using INT as a date has been a common practice for datawarehouse solutions for many years, but that might change now when we finally got the DATE column that is even smaller then INT and you can apply DATE functions on it. On the other hand, it is hard to change a common practice and you might run into other problems with developers trying to filter on functions instead of using the date dimension. And in SQL server 2012, you might consider using an even more denormalized datamodel with the new COLUMN STORE INDEX, but in any case, you should try different approaches to find whats best suited for your specific requirements. Yesterday I held a presentation about column store index and how to use it and how you could get even faster queries for your datawarehouse. There are always a downside in every solution and you need to be aware of them and choose the right solution for you. But if you go against the common practice, you will always be questioned : - "why did you do it like that?" - "what are you thinking?" - etc. If you do, make sure you are prepared by having the answers
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Usman Butt avatar image Usman Butt commented ·
@Håkan Nice one. Wanted to know if your presentation about column store index is shared anywhere OR can be shared by yourself? Thanks.
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
It will be shared in the next couple of days. You will be able to find it at http://www.sqlservice.se/category/sprak/english/ and www.addskills.se (somewhere)
0 Likes 0 ·
Usman Butt avatar image
Usman Butt answered
I am not in the favor of storing any datatype with different datatype. Since you do no require time part of the date, then why not use DATE datatype which storage structure is 3-byte integer (whereas INT has size of 4 bytes). This will save you 1 byte for every record in RAM as well as on physical storage. Since it is a warehouse, I believe there would be millions of records and you can save your precious memory and storage. One thing with storing date as INT is that the Date dependency on LANGUAGE is not to be handled. But then you have the overhead of maintenance. There could be some advantages for storing it as INT, but I will definitely go with DATE datatype. Anyways, you should discuss it with your architect, and do let us know if he comes up with better thoughts.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.