question

kardile avatar image
kardile asked

Staging load using Select....Into

Hi,

We have a data warehouse which reads millions of data from source system. Currently the staging population process takes lot of time even though the method is bulk load.

I have been told that “SELECT….INTO” works very fast.

I am planning to use (redesign the staging load process) “SELECT….INTO” this method to load the data into staging area.

The plan is…

  1. Drop all staging tables
  2. Run “SELECT….INTO” queries using stored procs. - this would create table with staging data
  3. Run the further process to load the staged data into dimensions & facts

Does this make any sense?

Datawarehouse experts - Can you please comment on this?

performance
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
is this the same process as your other Question - 800M rows using Cognos ETL? here: http://ask.sqlservercentral.com/questions/3716/performance-with-commit-interval/3717#3717 ?
0 Likes 0 ·
kardile avatar image kardile commented ·
No, that is loading data from staging to fact.
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered

You're also going to find, as time goes on, dropping all tables and recreating all data is going to lead to an ever expanding time frame for data load. Eventually you will have to bite the bullet and develop a methodology to incrementally load the data. The best thing to do is start that process on day one, even though it means a bit more work up front.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

Before doing a lot of work on it, I would suggest testing a single fixed size data set, because I would be surprised if SELECT...INTO was any faster than a well designed bulk load.

1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
+1 - test,test,test
0 Likes 0 ·
Scot Hauder avatar image
Scot Hauder answered

Do not use SELECT..INTO for this, it will lock system tables for the duration of the select since the new table schema is not known until the rows are returned. As an alternative, create the staging tables first, then use INSERT..SELECT. I agree it should be an incremental load and as Fatherjack points out, test thoroughly.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Ian Roke avatar image
Ian Roke answered

I wish Grant worked for my organisation! :-)

Like he says the absolute best thing you can do is set up a simple SSIS package that loads the previous days data across and schedule it to run in the middle of the night on a SQL Server scheduler.

By far the best method. I don't envy you at all having to play with 800 M rows of data though!

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.