question

paws27284 avatar image
paws27284 asked

Multiple data flow tasks vs Single data flow task

Which is better? A single data flow task that when opened there are multiple sources and destinations each extracting a different table OR multiple data flow tasks that when opened there is a single source and destination extracting a table.
sql-server-2008ssis
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Pavel Pawlowski avatar image
Pavel Pawlowski answered
As @busynovadad mentioned, there is a lot of factors which have impact on the what scenario to choose. But in general: - For small tables with relatively few rows, you can put multiple sources/destinations in single data flow - If you have complex ETL, for the source/destination, than it is better to put them to separate data flow tasks for clarity - If you need to define the sequence of execution you have to use multiple data flow tasks, as you cannot control the order of execution for multiple sources/destinations in single data flow tasks. - Whenever you need different transactional isolation level or behavior, you have to put them into separate data flows. - Whenever you are unsure on the impact of the ETL on the source system put them in separate data flows as it will allow you to optimize the execution order in the future more easily. - If you have large tables than put them into separate data flow tasks, as this will allow to optimize buffer sizes for different tables and optimize the ETL process for whatever reason So from the above if you have relatively small tables, and straight source/destination mapping, than there is no problem to have more source/destinations in single data flow. In other cases it is better or necessary to put them into separate data flows as it will allow you optimize the ETL process from all three points of view: 1. Load impact on the Source systems 2. Load impact on the destination systems 3. Utilization of the machine on which the ETL process is running (CPU consumption, memory consumption and overall though output).
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thank You both for the quick answers. Pavel, your general recommendations are well received!
0 Likes 0 ·
busynovadad avatar image
busynovadad answered
The short answer here, as with so many of these things, is "it depends". Are there any other transformations in the flows? What about lookups? Or other disk-based costs inside the same flows? What is your disk situation? (SAN, Local, SSD)? What about contention on the LUN's? How are your data sources partitioned? What is the physical layout of your disk(s)? What is the shape of your data? What is / are the sources? What is / are the targets? Who's the end consumer? For a small, 100GB, locally-hosted, non-partitioned, traditional OLTP on a recent version of Windows running against SQL 2008 R2? Or a 50TB Analytics engine, hosted in a blue-sky private cloud, building cubes for dimensional analysis for private use by a CXO of a Fortune 100 company? It really all depends on your particular environment. For example, in one of mine, for example, where it's one small VM running batch jobs once a day, it makes more sense to single-thread the many data flows, so the one flow doesn't start until the previous has finished. Experience has shown that this allows us to still get reasonable system performance, even while the ETL is running. In another environment, though, I have multiple data flows running without constraints inside of sequence containers, each of which only ETL's one table, and there are multiple sequence containers constrained together to control the flow of execution.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.