question

kardile avatar image
kardile asked

partition & parallel ETL load

I read a microsoft white paper that explain 1TB data loading in 30 minutes with SSIS.

link http://technet.microsoft.com/en-gb/library/dd537533.aspx

"We Loaded 1TB in 30 Minutes with SSIS, and So Can You"

In this paper, the source is multiple text files. Most of the cases we need to load data from a single table and selecting data (filter the data for parallel load) from a single table would be very expensive.

My question here is, what are the methods to do a parallel load if the source is single table?

Regards

ssisperformance
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered

You're not going to be able to get a parallel process like what was outlined because you'll be limited by the bus connecting the location of that single table. You either have to change the source so that you can actually perform parallel operations or you need to examine the process to see if you can do filtering and what the costs might be there.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Tom Staab avatar image
Tom Staab answered

Why would the filter be very expensive? What if you create parallel paths using an indexed column or columns as the filter key? If the table is very large, it would be even better if it was partitioned on the same key. Another potential option would be to create separate indexed views for each path.

Also, it is important to note that the maximum number of concurrent executables (i.e. tasks) is limited by the MaxConcurrentExecutables setting. By default, it is set to -1 which means the number of logical CPUs + 2.

See this article for more information on SSIS parallelism and other aspects of performance: http://msdn.microsoft.com/en-us/library/cc966529.aspx#EDAA

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

kardile avatar image
kardile answered

What If source system doesn't allow to do any partitioning or indexing?

Let me put a case here:

I have a table in my source system which has around 950m records. We want to populate the data into data warehouse with a complete ETL load. We have developed a build with SQL bulk load option but it takes around 10hrs. And it is not possible to add partition or index on the source table.

I am looking for an effective method to minimise the ETL window.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.