Storing non-alphanumeric data as varchar (date, number, decimal, etc)
I have a schema that has been reviewed and is well liked by our Lead Engineer. However, the one point of contention is this, although we do know which data-type we are expecting, i.e. 'Plow Width:' we expect a number or 'Insurance Expiration Date:' we expect a date. So, I didn't want to clutter the table with a field for every possible data-type and then use NULL handling to figure out which field was NOT NULL. Rather I store every answer as a varchar and unless we have a NEED to do a calculation on it, it is what it is, the user's answer. We do know from our form element (question) Now I am being told to review the way we store this particular data and the customer want numbers to be stored in a number data-type; decimals as decimal/float, dates as datetime, etc. Aside from the normal reporting problems of pairing the answer with the question's PKID, why it is such a sore point with this engineer to store all data as a varchar? All this came from our reporting DBA - whom I replaced as architect - as he never liked this schema, and is using this one field's data-type (flaw) as his only leg to stand on when reports come into question. If I wanted to get away from the Key Value Pairing he insists is going on here, can you give me an alternative?
The key value approach is often selected because of its seeming simplicity and unlimited flexibility. You can define any question and any answer and just store them as text. The application always knows what it needs and how to use it. I won't go there for this, but the logical extreme is, of course, the single-table two-column varchar(max) table that just holds "things" and their keys using WHERE clauses to sort it all out (could be questions and answers, customers and addresses, etc). If you have a large number of question/answer pairs or need to redefine question/answer pairs on the fly, it might make sense to take the flexible key/value approach. However, if the question/answer pairs are somewhat limited, or are defined once per application revision then stay static, putting each answer in its own column might be the way to go. Because... There are a few disadvantages to the key/value approach, and some center on reporting and ad-hoc querying. If the reporting occurring straight out of the database, it doesn't really help the report software (or writer) to "know" what the data type is or what can be done with it. Reporting is often on date ranges (between 2 Feb and 9 Mar), and storing the data as varchar not only makes the between do an alphabetical between. The report writer must cast/convert the column - but not all the rows in the column, just the ones that "should be" datetimes. Can the answers be summed; averaged or ordered alphabetically, numerically or temporally? Knowing the data type tells the reporter what can be done with the answers. Get the where clause a little wrong so data types mix, or a bad data type validation in the application, and the report breaks. Another mistake in the where clause, and the reporter has now averaged 'Plow Width' (maybe in centimeters) with 'Fuel Capacity' (maybe in litres). They aggregate nicely, and the report doesn't actually "break" in cases like this; it is just incorrect in fuzzy, and often difficult to detect, ways.
There are some major advantages to using the proper data type. Briefly: - The proper data type will almost always use less space than storing it as varchar. - Not needing to convert in order to do certain types of comparisons or calculations will generally make it more computationally effecient. - The user is constrained to enter the proper data type and that alone can help stop many transcription and data entry errors, as well as prevent problems if you try to do an invalid cast operation to do those calculations. In short, if you know that a field will always be an integer, then making it an integer will have major advantages. One reason you may not want to do that though is if your goal is to transcribe handwritten data as closely as possible to what was actually put on the form. But in that case your database is not meant to directly answer "What was the date for this", but rather answers "What did the person writing this survey put in the data field?" and that may be a long commentary rather than an actual date. But if you intend for your database to actually answer questions like "What was the date?" then you are better off using an appropriate data type.