x

SQL Server 2000, how to automate import data from excel

Say the source data comes in excel format, below is how I import the data.

  1. Converting to csv format via MS Excel
  2. Roughly find bad rows/columns by inspecting
  3. backup the table that needs to be updated in SQL Query Analyzer
  4. truncate the table (may need to drop foreign key constraint as well)
  5. import data from the revised csv file in SQL Server Enterprise Manager
  6. If there's an error like duplicate columns, I need to check the original csv and remove them

I was wondering how to make this procedure more efficient in every step? I have some idea but not complete.

For step 2&6, using scripts that can check automatically and print out all error row/column data. So it's easier to remove all errors once. For step 3&5, is there any way to automatically update the table without manually go through the importing steps?

Could the community advise, please? Thanks.

more ▼

asked May 13, 2010 at 08:13 PM in Default

Stan gravatar image

Stan
81 10 10 12

(comments are locked)
10|1200 characters needed characters left

1 answer: sort oldest

In summary, to import a dirty excel file, I would write a script to clean it up if possible (it almost always is depending on how it is dirty) and then use DTS and that should be the entire process. No additional steps should normally be needed to ge the data into SQL.

Here is my comments on each step:

  1. I do not see why you need to do this. You can probably inspect as easily if not more easily in excel, and excel can be imported directly into SQL in a number of ways. Since you are using 2000, you will not have SSIS, but you should DTS which works quite nicely. You could also consider using opendatasource from inside of SQL or else look at some of the third party programs that do specifically this.

  2. This depends on how you are defining bad. If it is something you can specify in enough detail to turn it into code...then you can turn it into code. Personally I would use Python to do this which has robust libraries to work with excel. You could use just about any major language though, including the VBA capabilities built into excel itself.

  3. While I hate to ever suggest not backing up a database, it is probably not necessary to make a special backup before importing data. This is especially true if you are in FULL recovery mode so you could backup the log after the fact and do a point in time restore.

  4. I do not understand why you are doing this, but then I do not know your work flow or what you do with it after the import. But truncating is extremely fast at least.

  5. As mentioned, you could use DTS and do this without ever converting to CSV.

  6. If you fully automate the data cleansing step, then this should not be a concern.

[Edited to address the comments]

Regarding Python libraries that work with excel, I would start with http://www.python-excel.org/. It has the downloads for xlrd and xlwt which together let you handle excel from within Python quite nicely. pyExcelerator is another option I have used in the past, though I think that xlrd and xlwt are superior for most use cases.

As to the date format, that should be relatively easy to handle though precisely though how you do it depends on the exact situations. If they are stored as dates in excel (or you can easily convert them to that) then you should have no problem importing them into SQL as dates. If they are stored as text that is in one of the standard date formats, then SQL is pretty good (but not perfect so do some testing!) about doing implicit conversions with very little user work. And if they are truly messed up then the Python datetime library is very good at parsing any date format that is remotely close to any of the standard formats so you can use that as one of many options to do that cleanup in an automated fashion.

more ▼

answered May 13, 2010 at 09:19 PM

TimothyAWiseman gravatar image

TimothyAWiseman
15.5k 20 23 32

Thanks for the reply.

  1. Would you please advise the Python library that can process Excel files and any keywords that I can search online to figure out how to do that?
4. Since the data provider use Excel files to manage data, every time they send us the Excel files which we don't know what have been changed. So we just truncate tables and re-import the updated data.
May 13, 2010 at 10:08 PM Stan
Also, to explain how dirty the source data is: 1. might have duplicate columns that violate key restriction, 2. datatime format in Excel cells might be inconsistent, eg. some may be 'May 13 2010 5:00pm', some 'May 13 2010'. 3. some rows that have several columns empty
May 13, 2010 at 10:15 PM Stan
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x473
x115

asked: May 13, 2010 at 08:13 PM

Seen: 2018 times

Last Updated: May 14, 2010 at 10:09 AM