question

Wim avatar image
Wim asked

Data vaults: new datawarehouse approach

What do you guys think of the "new" data vault approach to design a datawarehouse? Do you agree with what is stated?

Reading some documents about the topic, I somehow disagree with certain statements about the other techniques. They state that using 3NF you have a problem with the datestamp column in the primary key, but we can easily use a surrogate key and use that in the other referencing tables. In my company we are designing for a time-phased approach of data and the ways they use to put the data in a data vault are the same that we use. I don't see anything new about their idea except they kind of thought of a way to put it in a normalized database, nothing very spectacular because we used the same technics without ever heard of data vaults.

Regards, Wim.

data-warehousedata-vault
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Cade Roux avatar image
Cade Roux answered

I assume you are referring to this article.

We use a traditional Kimball dimensional design and I don't believe that the problems he refers to are significantly solved by his approach.

I am also worried by the fact that he claims a patent is pending on his modeling methodology.

One thing I have learned about DW so far - there are lots of different approaches and they all have drawbacks, and even for any particular enterprise, finding a perfect approach is not really possible - it is actually quite likely that having multiple modeling approaches for different parts of the warehouse or even for the same data (available in more than one model) might be necessary (and, in fact, advantageous).

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

David 1 avatar image
David 1 answered

I didn't read it all but it looks very like another data modelling straitjacket of the same ilk as Ralph Kimball's, i.e.: lots of jargon and an invitation to lock yourself into a set of inflexible design patterns for no good reason. I suspect that millions of us will continue to get on just fine without it.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Nige avatar image
Nige answered

I think the "Patent Pending" bit gives it away. It's an attempt to foist a new methodology on the world but one that can be protected, licensed and monetised. There are a lot of words there but little substance. Yes data volumes are growing and yes data complexity is growing which to my mind is a strong argument for continuing with current practices and simplifying the structures within data warehouses not making them more complex. Yes, this has a cost of some redundancy but since storage is cheap I don't see it as an issue as long as data consistency is maintained. I may not agree with everything Kimball says but I think he got it right when he says that the key driver of a data warehouse should be the information needs of the business not the technology. Keep it simple, make sure it's clean - you know it makes sense.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Daniel Linstedt avatar image
Daniel Linstedt answered

Patent pending was dropped in 2003, so please check your facts. Those articles were published in 2001/2002. If you have questions, please feel free to post them directly to me, I'd be happy to answer them. Regarding many words and little substance... if this were truly the case, then we wouldn't have many many large organizations actually using the Data Vault successfully. Including but not limited to: DOD, US Navy, US Air Force, Central Bank of Indonesia, Diamler Motors, SNS Bank, Central Bureau of Statistics, Netherlands Tax Authority, Dutch Police, Edmonton Police, JP Morgan Chase, Cendent Timeshare Resource Group, Lockheed Martin, and so on....

I'm sorry that you think there is little substance. Please feel free to read more information at: http://www.DataVaultInstitute.com, http://www.b-eye-network.com/blogs/linstedt, wikipedia, and other places like "Data Warehousing for Dummies" book, etc...

I disagree with the statement that "simply using a surrogate" suffices to track data over time. What happens if the relationships change in 3NF? What happens when you have History of History? What happens when attributes change? How fast does your IT team respond to changes, is it 30, 60 or 90 days or more for your team to incorporate the changes necessary to the data warehouse? Does your model scale to the Petabyte size warehouse? How about just 100 Terabytes?

Lots and lots of questions for you, that I'm curious how you would answer them. I have begun answering these questions, and more on the forums, blogs, and other areas of the web.

Thank-you kindly, Dan Linstedt

As always - feel free to sign up for www.DataVaultInstitute.com - it's free to do so, and post your questions.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.