I have one clustered and several non-clustered indexes on a data warehouse table with about 18 M rows 170 columns - about 18GB in size. The table comes from a Legacy system, and I cannot change it's structure.
This table is truncated and loaded on a daily basis.
For optimum performance, which is there an order to loading data and creating indexes?
Also, what do I do to this massive table in order to prep it for the indexes?
asked Dec 12, 2011 at 01:02 PM in Default
It depends on if the load system needs the indexes as part of it's process. But assuming that it runs faster without them (a fairly common batch load scenario) I would suggest that you build the clustered indexes first, then the nonclustered indexes. Otherwise, you're effectively building the nonclustered indexes twice. They'll get built once for a heap table, then when the table gets transformed to a cluster, the nonclustered indexes have to be completely rebuilt with pointers to the cluster key.
And what @Fatherjack says.
The fastest solution would be truncating the table, disable the indexes, load the data and rebuilld the indexes to enable them.
Do you really need to reload every record? How many records are actually changed? Can you use bulk load and extract only the new/changed records (MERGE maybe)?
Test is the key to be sure!
answered Dec 13, 2011 at 03:54 AM
Is the data that you want to load ordered in any way? If it's loaded according to the clustered index, you could load without disabling the clustered index, and then rebuild the non clustered indexes.
answered Dec 13, 2011 at 04:15 AM