update 40million records

Hi All,

I have a requirement to update 40 million records. Is there anything you would like to suggest me to improve the performance. except for removing indexes.

Thanks, Kannan
more ▼

asked Sep 27, 2012 at 02:20 PM in Default

aRookieBIdev gravatar image

2.3k 53 57 62

Well, have you done any testing that you can show an execution plan? Also, if you can post your code that would help as well. Have you given any thought to batching the updates?
Sep 27, 2012 at 02:28 PM JohnM
(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

Grant covered the big points, but I would add a couple.

  1. If possible, consider your timing. It will be done faster if there is less contention for the server, and interfere with other people less if done when there is less contention for the server. In short, if possible do it at a time when the Server is seeing little use, perhaps in the middle of the night.
  2. Make sure whatever source you are pulling the update from is effecient and using effecient code. If you need to do considerable preprocessing to the data before the update, it might be worth doing that all and caching it in a staging area before you do the update. If you are worried about tying up your destination server, make sure it isn't spending a lot of time just waiting for the source to provide the next piece of data. If the source data is in another SQL Server (or similar) database, make sure that is properly indexed.

And at the risk of stating the obvious, if you remove the indexes that won't be needed for the search during the update, it may very well make the update faster. But it might not affect the speed of the update if the update is of a kind that will neither cause the records to be rearranged nor change the data in the index.

For instance, say I have a table with three integer columns. For simplicity col1, col2, col3. Say col1 is my primary key and clustered index and I have an index on col2. Now, if I update col1 that may rearrange the records (or at least alter the pointers to the records) and the index on col2 will be updated, so removing the index will speed up an update on col1. Changing col2 will of course change the data in the index on col2, so removing that index will probably speed up that update (depending on whether the index would have helped find which records to update in the first place...). But changing col3 will not normally reorder the records nor change the data in the index and in most cases removing the index on col2 will not speed it up.

(It is of course a bit more complex than that with the way writes are cached and where data is physically written, but that lays it out in a simplified manner and will apply in most cases. It is also worth noting that I made all of the columns integers, a datatype with a fixed width. Its trickier with variable width datatypes especially when increasing the amount stored in them and when it could change how the data has to be stored.)

Also, if you do decide to remove indexes, you may need to put them back. You need to factor in the time to recreate them to your total operation time when you are planning this.
more ▼

answered Sep 27, 2012 at 07:08 PM

TimothyAWiseman gravatar image

15.6k 21 23 32

(comments are locked)
10|1200 characters needed characters left

I'm not even sure I'd remove the indexes, or maybe not all the indexes. Presumably you'll need at least one to help with the search routines on the update process. The primary problem is going to be the size of the transactions you're dealing with. So, instead of updating 40 million records, I'd suggest updating a million forty times. Maybe even break it down smaller than that. Since this will be a logged operation, make sure you have your transaction log file big enough to support the size of the transaction before you start running it. If you have to wait for it to grow, you'll be very sorry. Also, depending on how the data affects the one index you'll have in place, you may see sorting occurring, so tempdb will need to be big enough to support the processing. Again, set it up ahead of time.

Other than that, I'd strongly suggest testing the process somewhere other than production in order to identify any other issues before you start the process on live data.
more ▼

answered Sep 27, 2012 at 02:28 PM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
103k 19 21 74

+1 on keeping in mind the size of the transaction log.
Sep 27, 2012 at 02:37 PM JohnM
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Sep 27, 2012 at 02:20 PM

Seen: 1029 times

Last Updated: Sep 27, 2012 at 07:08 PM