question

kctechnopro avatar image
kctechnopro asked

What is the best performing UPSERT method in SQL 2005?

We are going to be bulk loading a hundred million records at time and we need to be VERY performance considerate. I have seen several methods but need the best performing one.... ideas?

sql-server-2005ssisupsert
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Susan Van Eyck avatar image
Susan Van Eyck answered

I'm a big fan of bcp. For large tables, I'll test the loads with keys/indexes in place and without them. If there's a significant advantage to the latter, I'll script dropping the keys/indexes, doing the bcp in, and then recreating the keys/indexes (the clustered one first). And if you need to clear out a table before piping in the new data, use truncate table if possible.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

BenchmarkIT avatar image
BenchmarkIT answered

I agree with Susan... BCP is great for stuff like this

If SSIS is an avenue that you're looking to go down be VERY careful especially if you do a "Lookup Match". Doing this will basically process your data row by row much like a cursor would (not a good performance choice!! :) lol).

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

visakh16 avatar image
visakh16 answered

you've a fast parse option available in SSIS which gives a faster performance.

http://weblogs.sqlteam.com/mladenp/archive/2006/05/26/9992.aspx

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

nigelrivett avatar image
nigelrivett answered

Usually get the data into a table and then do the upsert via a stored procedure. The load would be bulk insert or ssis. Try them for performance - I prefer bulk insert for ease of use. It stoppd supporting UTF-8 though (not sure if that was 2005 or 2008) so if you are getting data in that format probably ssis is best.

Once in the table you can then insert on unfound rows and update found rows. For this you will need a natural key - have a non-clustered index on this in the destination table to do the check. For this number of rows you might be better with an artificial kett two (identity)? and make that the clustered index. Can then efficiently find rows that match and check the values for update. Also depending on how much memory you have it might be best to run the process in batches from the import (staging) table.

There is a temptation to do all this in SSIS but you often end up either doing a database read for each row to insert or loading all the natural keys and a checksum from the destination table. Whatever you do don't be tempted to try the slowly changing dimension task!

You will need two statements - update existing then insert new. Might be worth trying adding an indexed identity to the staging table and using that to flag in another table inserts and updates then joining to that for the actual process.

I'm currently doing this in v2008 using a merge statement (to populate an audit trail at the same time). Had my doubts about the usefulness of merge but it's quite handy.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.