question

David 2 1 avatar image
David 2 1 asked

Data Warehouse Dimension Questions

Hi there, I am looking for some advice as to the best method to create a fact and related dimension tables for a data warehouse that will store real estate survey data. The main table that the data will come from consists of the following: CREATE TABLE RESurvey( SurveyID INT IDENTITY(1,1), SurveyDate DATETIME, Surveyor VARCHAR(100), SurveyorCompany VARCHAR(100), TenantName VARCHAR(100), PropertyAddr1 VARCHAR(100), PropertyAddr2 VARCHAR(100), PropertyAddr3 VARCHAR(100), PropertyCity VARCHAR(100), PropertyZip VARCHAR(10), SmokeDetectorKitchen VARCHAR(10), SmokeDetectorLivingRoom VARCHAR(10), SmokeDetectorHallway VARCHAR(10), SmokeDetectorBedroom VARCHAR(10), SmokeDetectorOperational VARCHAR(10)) I am going to create a fact table to store all the survey data and have separate dimensions for Date, SurveyCompany, and Property, however what would be the best way to create a SmokeDetector dimension? The data in the SmokeDetector columns are currently 'True' and 'False', and NULL values. I would like to create measures on all smoke detector locations in a property (can be in more than 1 room) and whether or not they are operational. What do you think? TIA
sql-server-2012ssasdata-warehousedimension
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@David 2 1 SmokeDetector does not appear to be a good candidate for dimension. It looks like you have the smoke detector related fields in your fact table, and these fields are going to be used as measures and calculated measures. The column values you have are not actually true or false, they are just the string representations, which means that will have to change them to, perhaps, numeric type converting true to 1 and false to 0 so that the default aggregation of sum will work out of the box. There is a small problem with the source data though. 4 out of 5 smoke detector columns represent a detector at specific location because the could be multiple detectors per property. In this case, what does the smoke detector operational represent? I mean which of them detectors is marked as operational or not when there are many? This column does not make much sense, there has to be either an operational column per each detector or a collection of detectors related to property, so each detector record may have 3 fields, i.e. property ID, detector location (kitchen, bedroom etc.) and operational flag.
1 Like 1 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, Thanks for the insight. Comments posted below due to character limitations in this comment box.
0 Likes 0 ·
David 2 1 avatar image
David 2 1 answered
@Oleg, Thanks for the insight. So far I have created something similar to the below fact table and populate it using a CASE statement to convert the 'True' and 'False' values into numerics that reference a DimDetector table using it as a role-playing dimension for all rooms. I've not done anything with the operational flag as yet, of which your observation is indeed correct however it is only flagged as operational if all detectors are working. I'm just wondering if this usage of role-playing dimensions is the best method? Or maybe I should look at a junk dimension? CREATE TABLE FactRESurvey( SurveyKey INT, SurveyDateKey INT, SurveyCompanyKey INT, PropertyKey INT, SDKitchenKey INT, SDLivingRoomKey INT, SDHallwayKey INT, SDBedroomKey INT, CONSTRAINT PK_SurveyKey PRIMARY KEY(SurveyKey), CONSTRAINT FK_SurveyDateKey FOREIGN KEY(SurveyDateKey) REFERENCES DimDate(DateKey), CONSTRAINT FK_SurveyCompanyKey FOREIGN KEY(SurveyCompanyKey) REFERENCES DimSurveyCompany(SurveyCompanyKey), CONSTRAINT FK_PropertyKey FOREIGN KEY(PropertyKey) REFERENCES DimProperty(PropertyKey), CONSTRAINT FK_SDKitchenKey FOREIGN KEY(SDKitchenKey) REFERENCES DimDetector(DetectorKey), CONSTRAINT FK_SDLivingRoomKey FOREIGN KEY(SDLivingRoomKey) REFERENCES DimDetector(DetectorKey), CONSTRAINT FK_SDHallwayKey FOREIGN KEY(SDHallwayKey) REFERENCES DimDetector(DetectorKey), CONSTRAINT FK_SDBedroomKey FOREIGN KEY(SDBedroomKey) REFERENCES DimDetector(DetectorKey),
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@David 2 1 It all depends on the reporting requirements. Usually, dimensions exists so that the factual data can be sliced and/or filtered. The time, for example, is a must because of the inevitable requirement to show, say, the survey company info on rows and time on columns (years, quarters, months), displaying the count of surveyed properties, count of them a year ago, 2 years ago, percent change, maybe percent of mix if the survey company has some hierarchy, such as geography, etc. Now, what is a detector? Does it have any usable attributes or hierarchies to justify its presence in any report so the data may be sliced by them? If the answer is yes then its existence is justified. If not then it is not. I mean if the detector has no attributes except the key then might not be a candidate unless it is needed so it may be used as a filter, i.e. gimmie the same report described above but only for those properties which have a detector of some sort installed in the living room. Junk dimensions are needed for highly detailed report requirements if any. To sum up, it is all about the requirements, they dictate which dimensions are needed and why (needed on rows, columns, filters).
1 Like 1 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, a point to bear in mind that there are numerous detectors (not just smoke detectors) that are surveyed in the rooms. It would be nice to have all the detectors in one dimension.
0 Likes 0 ·
David 2 1 avatar image
David 2 1 answered
@Oleg, Thanks again. The "gimmie the same report described above but only for those properties which have a detector of some sort installed in the living room" that is operational is the type of count I am after, also including the one similar to gimmie the properties where a detector of some sort has not been installed in the kitchen. So far I have a basic count of companies with surveys completed and am trying to get a hierarchy count to work but no luck - it just displays repeated of the number of test surveys (see attached). DimDetector dimension has the type, location, and installed values in it. ![alt text][1] ![alt text][2] [1]: /storage/temp/4335-hierarchy-count-doesnt-work.jpg [2]: /storage/temp/4336-company-count-works.jpg

5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@David 2 1 What does your **Fact Survey Count** count? It looks like the dimension usage between the measure group and dimensions is not setup yet, this is why the company count works but the detector does not. It appears that the measure just does not respond to the dimension, showing the total count of all rows in the fact table regardless of the dimension slices, which suggests the lack of the proper usage setting. Please check the dimension usage settings, you will have to have it setup for each of the detector related dimensions. If the count does not work for you then you can add a dummy column to your fact table with value of 1 for each row and then use sum to aggregate.
1 Like 1 ·
Oleg avatar image Oleg commented ·
@David 2 1 I was asking about the setup of the dimension usage (Dimension Usage tab of the cube view in VS). Is it possible that it is not setup yet between the measure group and dimension? From your data view it appears that the **Fact Survey Count** does not respond to the detector dimension, showing total count regardless of the slice. This could be caused by the lack of relationship under Dimension Usage tab. Also, it is not clear why you have the blank "Installed" values. If the detector is either installed or not then what is the purpose of the dimension member with blank value for this attribute? I am not sure about how does the survey fact (with only one row per property survey) handles detectors info. The detector is a catalog. So, smoke detector in bedroom is represented by 2 rows per bedroom in the underlying dimension table, one row with installed = true and another - with installed = false. I assume that every survey then has only one of 8 possible values in the SDBedroomKey column. But how the multiple bedroom detectors are handled if there is but one row in the fact table per property survey?
1 Like 1 ·
Oleg avatar image Oleg commented ·
@David 2 1 It is not trivial to figure out what this data means. It looks like the sum of counts now foots the total (the sum from top to bottom is 1032 which matches the number of records in your fact table), but the meaning of the numbers is not obvious. I guess it just represents the combinations, for example, there are 99 properties which do not have any detectors installed in kitchen and/or living room, there are 39 properties which have the smoke detector installed in the living room but not in the kitchen, etc. This means that you now have 5 combinations for each company/2 locations, i.e. 2 power 2 + 1 for unknown. If you add another location then you will have 9 combinations, i.e. 2 power 3 + 1 for unknown, and 17 combinations on your rows (Company name and detector type repeated 17 times) if you include 4 locations. The question about handling the situation when the detectors are installed in more than 1 room of the same location (i.e. 3 detectors in 4 bedrooms) still remains. Unknowns must go too because otherwise you might have a hard time explaining to your business users what is the difference between **not installed** and **impossible to tell** (unknown).
1 Like 1 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, the Fact count is just a count of the Survey Key. It is correct but like I say I'm having trouble getting the Detector dimension for work. I had gone into the dimension and set up the hierarchy, as attached, which looks correct in the dimension browser. Looks like I'll have to work backwards and see what I'm missing to get these counts. Thanks.![alt text][1] ![alt text][2] [1]: /storage/temp/4337-hierarchy.jpg [2]: /storage/temp/4338-hierarchy-browser.jpg
0 Likes 0 ·
hierarchy.jpg (55.9 KiB)
David 2 1 avatar image David 2 1 commented ·
@Oleg, thanks. I've spent the day going back over my dimension and rebuilding the relationship and the test cube. And now I am starting to get the individual counts (attached). As you say it must have been a lack of a relationship. I'll add in another type of detector and see how that goes. Thanks for your insight and have a great weekend. ![alt text][1] [1]: /storage/temp/4340-counts.jpg
0 Likes 0 ·
counts.jpg (120.0 KiB)
David 2 1 avatar image
David 2 1 answered
@Oleg, thank you Oleg. I can see you point regarding the increasing confusion of the many counts. Although I will be able to hide the 'unknowns' in SSAS, how else can go about designing the fact table and detector dimension to eliminate or reduce this count confusion? Below is my current schema. You can see the many role-playing dimension links into the DimDetector table, and that's currently only for the Smoke Detector - I will need to add other detectors. Do you think creating a data warehouse from this type of survey data is impossible, or have I looked at the design incorrectly? Thanks again. ![alt text][1] [1]: /storage/temp/4347-dw.jpg

dw.jpg (103.4 KiB)
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@David 2 1 Creating a multidimensional solution is certainly possible. The cubes have been around, and I am sure that it would be difficult, if not impossible to encounter a problem which has not been resolved by many others in the past. With data you have, it is possible that you might want to consider redesigning your fact table to accommodate the detectors. I don't know exact details of the requirements, but from the first glance I suspect that there is no need to pivot the factual raw data. I mean there should be no columns for each detector type, the factual data is meant to be stored by rows. The tools, such as Excel, Power BI, and even SSRS to some degree can easily pivot the data for reporting purposes, but raw data needs to be in rows. Talking to the business owners could help. Describe a complex situation with some sample data, such as surveys with say, multiple smoke and motion detectors installed (or not) in many different rooms, including multiple detectors in the rooms of the same type (i.e. bedroom 1, 2, etc.). Ask them what do they want to see in the report for this specific case. This will help with guiding the design. Let me know and I will have a look.
1 Like 1 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, thank you Oleg. I have to put this on hold for now. I'll get back to this in the near future. THanks for your help and suggestions so far.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.