Hi - does anyone have any tips for cleansing data with T-SQL? I've got a lot of columns of data to clean up and it seems the only way of cleaning it all is to handcraft T-SQL code to:
1) first profile my data, then, 2) Clean it.
For example, I have a column of datetimes with a lot of noise in it - nulls, future dates (there shouldn't be!) and dates that are far too old (i.e. they were incorrectly entered). I need to end up with some script that strips out (or possibly even cleans) the invalid rows. Trouble is, I need to do this for a lot of columns...
Does anyone know of a quicker way of doing this - is there something like this in SSIS or elsewhere that will make this job easier/quicker? A profiling tool of some sort?
asked Oct 12, 2009 at 12:44 PM in Default
I think you will get the best results by using hand crafted T-sql. However, SSIS does have some tools built in, as described in this article by http://msdn.microsoft.com/en-us/magazine/cc163731.aspx .
answered Oct 14, 2009 at 12:12 PM
There's nothing built into SSIS or TSQL to facilitate this. You'd need to track down a third party tool.
answered Oct 13, 2009 at 10:31 AM
Grant Fritchey ♦♦
I know it's a necessary evil in organisations where there's a buck-passing attitude that "app is near enough, the BI team can sort the junk out later" but cleanse at your own peril and only as part of a comprehensive feedback loop.
answered Dec 30, 2012 at 01:47 AM