Our team has been discussing some design principles for collecting historical data collection. Specifically, we are trying to capture patterns of drinking.
For older people, it will be hard to capture exact dates/months, because it is simply just too hard to recollect that information. In this case age makes sense to capture, because really we probably don't know a date, or at least it would always be estimated (birthdate + 21 yrs, etc).
For younger people, it may be easier to recollect more precise dates/months, so we would have more precision.
Trying to keep these in the same design model, what would your suggestions be? Age, date, other, etc?
asked Aug 25, 2011 at 06:58 AM in Default
In general, I support going with actual dates rather than ages, but this is one of the exceptions. In our data processing, we go by age rather than date. Federal TEDS (Treatment Episode Data Set) requirements use age of first use for substance abuse tracking, as it can be very difficult getting exact dates down. There might be some subset of your population which does know exact dates, but that percentage is probably low enough that "date of first use" is too noisy a signal to do anything reasonable with. Even years may be hazy for some people, but it's more reasonable that somebody could remember how old they were when they started drinking than on which date they started.
The other reason why an age would be OK from a data modeling perspective is that it doesn't change. Unlike a "3 years ago" type of field, "age of first use" is static: a person who began drinking at 18 will always have begun drinking at 18. If you were to have a field which has "how many years ago did you begin?" that would be a problem. It doesn't sound like you're doing that, though, so that's safe.
answered Aug 25, 2011 at 12:33 PM
Are you building a system that is set to cope with personal data - a date of birth needs to be protected whereas an age is less likely to. If you have no other reason for the DOB then why not use age for everyone?
answered Aug 25, 2011 at 07:03 AM
I would go for age, unless the time-of-year will be used in the analysis (do kids drink more in the summer etc?). Then data analysis could be done without calculating the age. But it does really depend on how you will analyse your data, it's hard to say without knowing your data and your requirements.
answered Aug 25, 2011 at 07:06 AM