Warehouse Generation Table Clustered Index

We are generating data in the warehouse. The input table is going to have about 50 million rows per daily partition, and a total of 17 partitions will be saved (rolling partition delete).

During one of the discussions around scalability (partitioning and clustered indexes), my architect said that he prefers to put an INT column instead of a datetime column that has only Date (no time). So the datetime column would be translated from 1/1/2011 to 20110101.

Create Clustered Index IdxPDVX24Partition (AsOfDateId, DateColumnId)

The AsOfDateId is what we have partitioned on and the DateColumnId is what most of the queries will run on.

What are the advantages of using a INT representation of a DateTime in the clustered index?

more ▼

asked Nov 10, 2011 at 03:39 PM in Default

avatar image

Raj More
1.8k 83 89 90

(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

I would answer, It depends.

Using INT as a date has been a common practice for datawarehouse solutions for many years, but that might change now when we finally got the DATE column that is even smaller then INT and you can apply DATE functions on it.

On the other hand, it is hard to change a common practice and you might run into other problems with developers trying to filter on functions instead of using the date dimension.

And in SQL server 2012, you might consider using an even more denormalized datamodel with the new COLUMN STORE INDEX, but in any case, you should try different approaches to find whats best suited for your specific requirements.

Yesterday I held a presentation about column store index and how to use it and how you could get even faster queries for your datawarehouse. There are always a downside in every solution and you need to be aware of them and choose the right solution for you.

But if you go against the common practice, you will always be questioned :

  • "why did you do it like that?"

  • "what are you thinking?"

  • etc.

If you do, make sure you are prepared by having the answers

more ▼

answered Nov 10, 2011 at 11:44 PM

avatar image

Håkan Winther
16.6k 38 46 58

@Håkan Nice one. Wanted to know if your presentation about column store index is shared anywhere OR can be shared by yourself? Thanks.

Nov 10, 2011 at 11:55 PM Usman Butt

It will be shared in the next couple of days. You will be able to find it at http://www.sqlservice.se/category/sprak/english/ and www.addskills.se (somewhere)

Nov 11, 2011 at 12:00 AM Håkan Winther
(comments are locked)
10|1200 characters needed characters left

I am not in the favor of storing any datatype with different datatype. Since you do no require time part of the date, then why not use DATE datatype which storage structure is 3-byte integer (whereas INT has size of 4 bytes). This will save you 1 byte for every record in RAM as well as on physical storage. Since it is a warehouse, I believe there would be millions of records and you can save your precious memory and storage.

One thing with storing date as INT is that the Date dependency on LANGUAGE is not to be handled. But then you have the overhead of maintenance. There could be some advantages for storing it as INT, but I will definitely go with DATE datatype.

Anyways, you should discuss it with your architect, and do let us know if he comes up with better thoughts.

more ▼

answered Nov 10, 2011 at 10:07 PM

avatar image

Usman Butt
14.9k 6 13 21

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Nov 10, 2011 at 03:39 PM

Seen: 1468 times

Last Updated: Nov 10, 2011 at 03:40 PM

Copyright 2018 Redgate Software. Privacy Policy