question

Mrs_Fatherjack avatar image
Mrs_Fatherjack asked

Importing data to SQL

How would you tackle having to import various flat files into SQL, the file format COULD change but is unlikely and more files may need to be added to the process. My thinking is to use SSIS but the person who started this project previously was writing it using command lines. All ideas greatfully received.
ssisimportcommand-line
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
seriously, SSIS might be the best. Would need to know more about source files to make choice. Happy to give lesson though. Come to SQLBits9 and attend my session ;)
4 Likes 4 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
LogParser ? ;)
2 Likes 2 ·
WilliamD avatar image WilliamD commented ·
@Fatherjack - totally forgot that, even though I saw your great lightning talk! But then again, that is command line.....
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
yeah its command line but has a COM object so can be executed from SSIS or SQL Agent ... you must have missed the last bit of the talk ... ;)
0 Likes 0 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
@Fatherjack Guess I'll need to be having a Log Parser lesson then, please.
0 Likes 0 ·
Show more comments
WilliamD avatar image
WilliamD answered
SSIS is the way to go IMO. Here at work we have some sort of dynamic SSIS implementation, where the control data is stored in sql server, allowing new file formats to be easily added without having to re-deploy the SSIS package for each additional file format. I'm not too sure how it was done (before my time, and SSIS is pretty much an unknown for me), but it is very flexible and fast to the bargain.
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Pavel Pawlowski avatar image Pavel Pawlowski commented ·
Definitelly if it's possible, that the source could change in the future and know, that the destination should stay the same, then I fist load the data into a staging tables (no matter whether using SSIS, BCP, Log Parser, open rowset or what ever method), and then process the data from the staging tables to it right destination as necessary. In case yor source is changed, you only alter the first step - loading source to staging table and do not need to touch the whole process.
1 Like 1 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
That's the kind of thing I'm looking for. Many thanks.
0 Likes 0 ·
Pavel Pawlowski avatar image Pavel Pawlowski commented ·
Dynamic implementation of SSIS is not an easy task, as the metadata are checked upon the package start. The only dynamic possibility of dynamic source is to use a Script Component in the data flow task as source with fixed output columns and implement the dynamic behavior inside the script. Normally you have a packages which load data into staging tables (here is the implementation of the source) and if the source changes, you only change those staging packages and redeploy them. And then set of packages processing the staging tables to their final destination, which are not affected by potential source flat file change.
0 Likes 0 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
So write an SSIS package to put the data into the staging table, and run the process post staging table separately?
0 Likes 0 ·
WilliamD avatar image WilliamD commented ·
@Pavel - a good point, I believe that is what was done for our implementation, but cannot be sure. As such, it doesn't matter if you did it in SSIS or some other scripting language all together, your metadata is stored in tables and the script interprets it.
0 Likes 0 ·
Pavel Pawlowski avatar image
Pavel Pawlowski answered
If you know, that the source file format may change in the future, but the destination tables will remain the same, I go by SSIS, because in that case, when source file changes, you only update the SSIS by updating the flat file source and add appropriate transformation to meet the destination tables.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered
Deep down I know the answer is SSIS... If it's likely to change, you probably should take that into acount and write the import process in a language that can deal with it appropriately. I'd use PowerShell.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

TimothyAWiseman avatar image
TimothyAWiseman answered
I'm going to join with everyone else and say that if the file format is any one of the standard types (CSV, TSV, fixed width) that SSIS is the way to go. And this remains true even if you think the format might change, then you just update the SSIS packet. I would however look outside of SSIS if the file format is genuinely nonstandard or for some reason significant preprocessing before being imported. I have for instance dealt with situations where I just needed to extract the e-mail addresses from otherwise unorganized text, or where I needed to strip out unusual characters such as an ASCII null character or bell character before importing, or where numerous lines needed to be split into header and then value to be turned into a single row in the database. In all those cases, I turned to a more flexible programming language (normally Python, but just about any of the major ones would work).
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.