x

Importing data to SQL

How would you tackle having to import various flat files into SQL, the file format COULD change but is unlikely and more files may need to be added to the process. My thinking is to use SSIS but the person who started this project previously was writing it using command lines. All ideas greatfully received.
more ▼

asked Apr 20, 2011 at 06:33 AM in Default

Mrs_Fatherjack gravatar image

Mrs_Fatherjack
4.7k 60 62 66

LogParser ? ;)
Apr 20, 2011 at 07:01 AM Fatherjack ♦♦

@Fatherjack - totally forgot that, even though I saw your great lightning talk!

But then again, that is command line.....
Apr 20, 2011 at 07:04 AM WilliamD
yeah its command line but has a COM object so can be executed from SSIS or SQL Agent ... you must have missed the last bit of the talk ... ;)
Apr 20, 2011 at 07:07 AM Fatherjack ♦♦
@Fatherjack Guess I'll need to be having a Log Parser lesson then, please.
Apr 20, 2011 at 07:14 AM Mrs_Fatherjack
seriously, SSIS might be the best. Would need to know more about source files to make choice. Happy to give lesson though. Come to SQLBits9 and attend my session ;)
Apr 20, 2011 at 07:33 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

SSIS is the way to go IMO.

Here at work we have some sort of dynamic SSIS implementation, where the control data is stored in sql server, allowing new file formats to be easily added without having to re-deploy the SSIS package for each additional file format.

I'm not too sure how it was done (before my time, and SSIS is pretty much an unknown for me), but it is very flexible and fast to the bargain.
more ▼

answered Apr 20, 2011 at 06:42 AM

WilliamD gravatar image

WilliamD
25.9k 17 19 41

That's the kind of thing I'm looking for. Many thanks.
Apr 20, 2011 at 06:45 AM Mrs_Fatherjack

Dynamic implementation of SSIS is not an easy task, as the metadata are checked upon the package start. The only dynamic possibility of dynamic source is to use a Script Component in the data flow task as source with fixed output columns and implement the dynamic behavior inside the script.

Normally you have a packages which load data into staging tables (here is the implementation of the source) and if the source changes, you only change those staging packages and redeploy them. And then set of packages processing the staging tables to their final destination, which are not affected by potential source flat file change.
Apr 20, 2011 at 07:09 AM Pavel Pawlowski
So write an SSIS package to put the data into the staging table, and run the process post staging table separately?
Apr 20, 2011 at 07:15 AM Mrs_Fatherjack

@Pavel - a good point, I believe that is what was done for our implementation, but cannot be sure.

As such, it doesn't matter if you did it in SSIS or some other scripting language all together, your metadata is stored in tables and the script interprets it.
Apr 20, 2011 at 07:23 AM WilliamD
Definitelly if it's possible, that the source could change in the future and know, that the destination should stay the same, then I fist load the data into a staging tables (no matter whether using SSIS, BCP, Log Parser, open rowset or what ever method), and then process the data from the staging tables to it right destination as necessary. In case yor source is changed, you only alter the first step - loading source to staging table and do not need to touch the whole process.
Apr 20, 2011 at 11:40 AM Pavel Pawlowski
(comments are locked)
10|1200 characters needed characters left
If you know, that the source file format may change in the future, but the destination tables will remain the same, I go by SSIS, because in that case, when source file changes, you only update the SSIS by updating the flat file source and add appropriate transformation to meet the destination tables.
more ▼

answered Apr 20, 2011 at 06:45 AM

Pavel Pawlowski gravatar image

Pavel Pawlowski
22.3k 9 11 21

(comments are locked)
10|1200 characters needed characters left

Deep down I know the answer is SSIS...

If it's likely to change, you probably should take that into acount and write the import process in a language that can deal with it appropriately. I'd use PowerShell.
more ▼

answered Apr 20, 2011 at 10:54 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
98.9k 19 21 74

(comments are locked)
10|1200 characters needed characters left

I'm going to join with everyone else and say that if the file format is any one of the standard types (CSV, TSV, fixed width) that SSIS is the way to go. And this remains true even if you think the format might change, then you just update the SSIS packet.

I would however look outside of SSIS if the file format is genuinely nonstandard or for some reason significant preprocessing before being imported. I have for instance dealt with situations where I just needed to extract the e-mail addresses from otherwise unorganized text, or where I needed to strip out unusual characters such as an ASCII null character or bell character before importing, or where numerous lines needed to be split into header and then value to be turned into a single row in the database.

In all those cases, I turned to a more flexible programming language (normally Python, but just about any of the major ones would work).
more ▼

answered Apr 20, 2011 at 03:16 PM

TimothyAWiseman gravatar image

TimothyAWiseman
15.5k 20 23 32

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x929
x18
x8

asked: Apr 20, 2011 at 06:33 AM

Seen: 1063 times

Last Updated: Apr 20, 2011 at 06:33 AM