x

Cleaning data with T-SQL

Hi - does anyone have any tips for cleansing data with T-SQL? I've got a lot of columns of data to clean up and it seems the only way of cleaning it all is to handcraft T-SQL code to:

1) first profile my data, then, 2) Clean it.

For example, I have a column of datetimes with a lot of noise in it - nulls, future dates (there shouldn't be!) and dates that are far too old (i.e. they were incorrectly entered). I need to end up with some script that strips out (or possibly even cleans) the invalid rows. Trouble is, I need to do this for a lot of columns...

Does anyone know of a quicker way of doing this - is there something like this in SSIS or elsewhere that will make this job easier/quicker? A profiling tool of some sort?

Thank!

more ▼

asked Oct 12, 2009 at 12:44 PM in Default

Ben Rees gravatar image

Ben Rees
61 2 2 8

(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

I think you will get the best results by using hand crafted T-sql. However, SSIS does have some tools built in, as described in this article by http://msdn.microsoft.com/en-us/magazine/cc163731.aspx .

more ▼

answered Oct 14, 2009 at 12:12 PM

TimothyAWiseman gravatar image

TimothyAWiseman
15.6k 21 23 32

(comments are locked)
10|1200 characters needed characters left

There's nothing built into SSIS or TSQL to facilitate this. You'd need to track down a third party tool.

more ▼

answered Oct 13, 2009 at 10:31 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
103k 19 21 74

(comments are locked)
10|1200 characters needed characters left

I know it's a necessary evil in organisations where there's a buck-passing attitude that "app is near enough, the BI team can sort the junk out later" but cleanse at your own peril and only as part of a comprehensive feedback loop.

  1. If there is stuff that just should not be getting into the data, the fault needs to be fed back and fixed in the source application and database ASAP.
  2. If the business managers are too lazy or flippant to care about getting the source application working properly, a business intelligence solution is probably beyond them anyway. It's a tool. It's not a magic wand.
  3. Whatever "cleansing" you do, make sure your decisions are fed back to the owners of the data and that they AGREE and SIGN OFF on them.
  4. As Jeff Moden says "If the data is worth having, it's worth validating." I'd just add to that "...as close to the point of entry as possible."
more ▼

answered Dec 30, 2012 at 01:47 AM

GPO gravatar image

GPO
2.9k 35 38 42

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x17
x4

asked: Oct 12, 2009 at 12:44 PM

Seen: 3076 times

Last Updated: Dec 31, 2012 at 01:34 PM