I read a microsoft white paper that explain 1TB data loading in 30 minutes with SSIS.
"We Loaded 1TB in 30 Minutes with SSIS, and So Can You"
In this paper, the source is multiple text files. Most of the cases we need to load data from a single table and selecting data (filter the data for parallel load) from a single table would be very expensive.
My question here is, what are the methods to do a parallel load if the source is single table?
You're not going to be able to get a parallel process like what was outlined because you'll be limited by the bus connecting the location of that single table. You either have to change the source so that you can actually perform parallel operations or you need to examine the process to see if you can do filtering and what the costs might be there.
answered Dec 21, 2009 at 05:04 PM
Grant Fritchey ♦♦
Why would the filter be very expensive? What if you create parallel paths using an indexed column or columns as the filter key? If the table is very large, it would be even better if it was partitioned on the same key. Another potential option would be to create separate indexed views for each path.
Also, it is important to note that the maximum number of concurrent executables (i.e. tasks) is limited by the MaxConcurrentExecutables setting. By default, it is set to -1 which means the number of logical CPUs + 2.
See this article for more information on SSIS parallelism and other aspects of performance: http://msdn.microsoft.com/en-us/library/cc966529.aspx#EDAA
What If source system doesn't allow to do any partitioning or indexing?
Let me put a case here:
I have a table in my source system which has around 950m records. We want to populate the data into data warehouse with a complete ETL load. We have developed a build with SQL bulk load option but it takes around 10hrs. And it is not possible to add partition or index on the source table.
I am looking for an effective method to minimise the ETL window.
answered Dec 22, 2009 at 02:03 PM