I have a requirement to update 40 million records. Is there anything you would like to suggest me to improve the performance. except for removing indexes.
Grant covered the big points, but I would add a couple.
And at the risk of stating the obvious, if you remove the indexes that won't be needed for the search during the update, it may very well make the update faster. But it might not affect the speed of the update if the update is of a kind that will neither cause the records to be rearranged nor change the data in the index.
For instance, say I have a table with three integer columns. For simplicity col1, col2, col3. Say col1 is my primary key and clustered index and I have an index on col2. Now, if I update col1 that may rearrange the records (or at least alter the pointers to the records) and the index on col2 will be updated, so removing the index will speed up an update on col1. Changing col2 will of course change the data in the index on col2, so removing that index will probably speed up that update (depending on whether the index would have helped find which records to update in the first place...). But changing col3 will not normally reorder the records nor change the data in the index and in most cases removing the index on col2 will not speed it up.
(It is of course a bit more complex than that with the way writes are cached and where data is physically written, but that lays it out in a simplified manner and will apply in most cases. It is also worth noting that I made all of the columns integers, a datatype with a fixed width. Its trickier with variable width datatypes especially when increasing the amount stored in them and when it could change how the data has to be stored.)
Also, if you do decide to remove indexes, you may need to put them back. You need to factor in the time to recreate them to your total operation time when you are planning this.
answered Sep 27 '12 at 07:08 PM
I'm not even sure I'd remove the indexes, or maybe not all the indexes. Presumably you'll need at least one to help with the search routines on the update process. The primary problem is going to be the size of the transactions you're dealing with. So, instead of updating 40 million records, I'd suggest updating a million forty times. Maybe even break it down smaller than that. Since this will be a logged operation, make sure you have your transaction log file big enough to support the size of the transaction before you start running it. If you have to wait for it to grow, you'll be very sorry. Also, depending on how the data affects the one index you'll have in place, you may see sorting occurring, so tempdb will need to be big enough to support the processing. Again, set it up ahead of time.
Other than that, I'd strongly suggest testing the process somewhere other than production in order to identify any other issues before you start the process on live data.
answered Sep 27 '12 at 02:28 PM
Grant Fritchey ♦♦