x

partition & parallel ETL load

I read a microsoft white paper that explain 1TB data loading in 30 minutes with SSIS.

link http://technet.microsoft.com/en-gb/library/dd537533.aspx

"We Loaded 1TB in 30 Minutes with SSIS, and So Can You"

In this paper, the source is multiple text files. Most of the cases we need to load data from a single table and selecting data (filter the data for parallel load) from a single table would be very expensive.

My question here is, what are the methods to do a parallel load if the source is single table?

Regards

more ▼

asked Dec 21, 2009 at 02:13 PM in Default

avatar image

kardile
301 14 14 16

(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

You're not going to be able to get a parallel process like what was outlined because you'll be limited by the bus connecting the location of that single table. You either have to change the source so that you can actually perform parallel operations or you need to examine the process to see if you can do filtering and what the costs might be there.

more ▼

answered Dec 21, 2009 at 05:04 PM

avatar image

Grant Fritchey ♦♦
137k 20 43 81

(comments are locked)
10|1200 characters needed characters left

Why would the filter be very expensive? What if you create parallel paths using an indexed column or columns as the filter key? If the table is very large, it would be even better if it was partitioned on the same key. Another potential option would be to create separate indexed views for each path.

Also, it is important to note that the maximum number of concurrent executables (i.e. tasks) is limited by the MaxConcurrentExecutables setting. By default, it is set to -1 which means the number of logical CPUs + 2.

See this article for more information on SSIS parallelism and other aspects of performance: http://msdn.microsoft.com/en-us/library/cc966529.aspx#EDAA

more ▼

answered Dec 21, 2009 at 04:19 PM

avatar image

Tom Staab ♦
14.5k 7 14 21

(comments are locked)
10|1200 characters needed characters left

What If source system doesn't allow to do any partitioning or indexing?

Let me put a case here:

I have a table in my source system which has around 950m records. We want to populate the data into data warehouse with a complete ETL load. We have developed a build with SQL bulk load option but it takes around 10hrs. And it is not possible to add partition or index on the source table.

I am looking for an effective method to minimise the ETL window.

more ▼

answered Dec 22, 2009 at 02:03 PM

avatar image

kardile
301 14 14 16

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1198
x304

asked: Dec 21, 2009 at 02:13 PM

Seen: 2880 times

Last Updated: Dec 21, 2009 at 02:21 PM

Copyright 2017 Redgate Software. Privacy Policy