Is it better to Insert Sorted data into a Clustered Index Table?
My data-warehouse table usually has about 18M rows in it. It is used in a bunch of calculations referring to a range of Dates so we have a clustered index on DateColumnId. Truncate Table RNFData Insert Into RNFData (ColumnList) Select ColumnList From Table1 Inner Join Table 2 On Table1.Column = Table2.Column ... Joins another 10-15 tables Since RNFData has a Clustered Index on the DateColumnId, I'm wondering if it is better to insert sorted or unsorted into this table. Is the query below better than the one above? Truncate Table RNFData Insert Into RNFData (ColumnList) Select ColumnList From Table1 Inner Join Table 2 On Table1.Column = Table2.Column ... Joins another 10-15 tables Order by DateColumnId
If you have a clustered index and have the possibility to sort the data you are about to insert on that clustered index, you should do that. Otherwise you might get both fragmented data and poor performance on your insert, because SQL Server will have to relocate data to fit a row in between other rows. If you cannot sort the data according to the clustered index, which is often the case, sometimes because the data is stored in textfiles with another sort-order, sometimes because your source tables don't have the indexes to efficiently support your ORDER BY clause, you should consider disabling the clustered index before the insert, then enable it after the insert. That's in most cases more efficient, because the sorting is done only once instead of for each row.
As @Wilfred and @Magnus posted. It's better to insert the data in the order of the clustered key. When bulk Inserting in the order of clustered key with Trace Flag 610 enabled, you can even achieve a Bulk Insert into the Clustered table instead of fully logged inserts. If you are not able to effectively sort the data prior inserting them to final destination table, you can even think about a staging table with which is clustered in the order of the final table. You can insert data into that staging table in batches which will allow in-memory sorting and do not need to utilize temp db for sorting - this will give you maximum performance during the inserts. Finally yo can do a INSERT SELECT into the final clustered table with MAXDOP=1. This will ensure a perfect non fragmented index for further maximum read performance. You can also take a look on: [Best Practices for Data Warehousing with SQL Server 2008 R2] [Fast Track Data Warehouse] [The Data Loading Performance Guide] :
I was taught on Oracle to load data unordered if possible, in order to balance the tree better, but I suspect these days are long gone, and databases index trees adapt to either sorted or unsorted data.