question

Raj More avatar image
Raj More asked

Is it better to Insert Sorted data into a Clustered Index Table?

My data-warehouse table usually has about 18M rows in it. It is used in a bunch of calculations referring to a range of Dates so we have a clustered index on DateColumnId. Truncate Table RNFData Insert Into RNFData (ColumnList) Select ColumnList From Table1 Inner Join Table 2 On Table1.Column = Table2.Column ... Joins another 10-15 tables Since RNFData has a Clustered Index on the DateColumnId, I'm wondering if it is better to insert sorted or unsorted into this table. Is the query below better than the one above? Truncate Table RNFData Insert Into RNFData (ColumnList) Select ColumnList From Table1 Inner Join Table 2 On Table1.Column = Table2.Column ... Joins another 10-15 tables Order by DateColumnId
sql-server-2008-r2clustered-index
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Wilfred van Dijk avatar image
Wilfred van Dijk answered
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
If you have a clustered index and have the possibility to sort the data you are about to insert on that clustered index, you should do that. Otherwise you might get both fragmented data and poor performance on your insert, because SQL Server will have to relocate data to fit a row in between other rows. If you cannot sort the data according to the clustered index, which is often the case, sometimes because the data is stored in textfiles with another sort-order, sometimes because your source tables don't have the indexes to efficiently support your ORDER BY clause, you should consider disabling the clustered index before the insert, then enable it after the insert. That's in most cases more efficient, because the sorting is done only once instead of for each row.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Raj More avatar image Raj More commented ·
@Magnus When you disable a clustered index, you effectively deny all read/write access to that table (at least on R2). To achieve that result, you would have to Drop-Insert-Create which is actually worse.
0 Likes 0 ·
Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
You're right. I meant drop the clustered index, not disable it. In what way and under which curcumstances is it worse to Drop-Insert-Create? If the data you insert is not sorted, it's often better to have SQL Server sort the pages once, after the insert, than have SQL Server try to sort each row into the right place during the insert. But there are of course other considerations to make. If you have tons of non clustered indexes, they will be impacted as well. On the other hand - with a big data load, you're better off disabling the non clustered indexes too. My experience with an empty table, being loaded with unordered data, is that it's faster to disable/drop indexes, load the data and then enable/create indexes again.
0 Likes 0 ·
Pavel Pawlowski avatar image
Pavel Pawlowski answered
As @Wilfred and @Magnus posted. It's better to insert the data in the order of the clustered key. When bulk Inserting in the order of clustered key with Trace Flag 610 enabled, you can even achieve a Bulk Insert into the Clustered table instead of fully logged inserts. If you are not able to effectively sort the data prior inserting them to final destination table, you can even think about a staging table with which is clustered in the order of the final table. You can insert data into that staging table in batches which will allow in-memory sorting and do not need to utilize temp db for sorting - this will give you maximum performance during the inserts. Finally yo can do a INSERT SELECT into the final clustered table with MAXDOP=1. This will ensure a perfect non fragmented index for further maximum read performance. You can also take a look on: [Best Practices for Data Warehousing with SQL Server 2008 R2][1] [Fast Track Data Warehouse][2] [The Data Loading Performance Guide][3] [1]: http://technet.microsoft.com/en-us/library/gg567302.aspx [2]: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/fast-track.aspx [3]: http://msdn.microsoft.com/en-us/library/dd425070(v=SQL.100).aspx
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Alendar avatar image
Alendar answered
I was taught on Oracle to load data unordered if possible, in order to balance the tree better, but I suspect these days are long gone, and databases index trees adapt to either sorted or unsorted data.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.