Which is better? A single data flow task that when opened there are multiple sources and destinations each extracting a different table OR multiple data flow tasks that when opened there is a single source and destination extracting a table.
As @busynovadad mentioned, there is a lot of factors which have impact on the what scenario to choose. But in general: - For small tables with relatively few rows, you can put multiple sources/destinations in single data flow - If you have complex ETL, for the source/destination, than it is better to put them to separate data flow tasks for clarity - If you need to define the sequence of execution you have to use multiple data flow tasks, as you cannot control the order of execution for multiple sources/destinations in single data flow tasks. - Whenever you need different transactional isolation level or behavior, you have to put them into separate data flows. - Whenever you are unsure on the impact of the ETL on the source system put them in separate data flows as it will allow you to optimize the execution order in the future more easily. - If you have large tables than put them into separate data flow tasks, as this will allow to optimize buffer sizes for different tables and optimize the ETL process for whatever reason So from the above if you have relatively small tables, and straight source/destination mapping, than there is no problem to have more source/destinations in single data flow. In other cases it is better or necessary to put them into separate data flows as it will allow you optimize the ETL process from all three points of view: 1. Load impact on the Source systems 2. Load impact on the destination systems 3. Utilization of the machine on which the ETL process is running (CPU consumption, memory consumption and overall though output).
The short answer here, as with so many of these things, is "it depends". Are there any other transformations in the flows? What about lookups? Or other disk-based costs inside the same flows? What is your disk situation? (SAN, Local, SSD)? What about contention on the LUN's? How are your data sources partitioned? What is the physical layout of your disk(s)? What is the shape of your data? What is / are the sources? What is / are the targets? Who's the end consumer? For a small, 100GB, locally-hosted, non-partitioned, traditional OLTP on a recent version of Windows running against SQL 2008 R2? Or a 50TB Analytics engine, hosted in a blue-sky private cloud, building cubes for dimensional analysis for private use by a CXO of a Fortune 100 company? It really all depends on your particular environment. For example, in one of mine, for example, where it's one small VM running batch jobs once a day, it makes more sense to single-thread the many data flows, so the one flow doesn't start until the previous has finished. Experience has shown that this allows us to still get reasonable system performance, even while the ETL is running. In another environment, though, I have multiple data flows running without constraints inside of sequence containers, each of which only ETL's one table, and there are multiple sequence containers constrained together to control the flow of execution.