question

Raj More avatar image
Raj More asked

Creating Indexes on multiple tables in parallel

I am running an ETL that moves data and generates some output tables. Sizes range from 20GB to 8GB to 4GB all the way down to 1MB (about 25 tables total). We are on VMWare virtual machines so I do not have the options of putting filegroups on individual spindles. Processing power is decent - 4 CPU cores and 32 GB RAM. I need indexes on these tables since they are going to be used in other processes. I only have one clustered index - all other indexes are non clustered. I am running SSIS, I have one SQL task to run all the index generation. However, I figure I can save time (and put the hardware I have to hard work) by running these index generations in parallel. How do I determine which ones to run in parallel and which ones to not?
ssissql-server-2008-r2indexingetldata-warehouse
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Usman Butt avatar image
Usman Butt answered
I believe it purely depends upon the environment. If you can live with high CPU utilization, slow response, you can try multiple index generation in parallel on different tables. Since you are in VM environment, you may never know when you would be striving for resources. I agree parallel execution could save you time if your resources are under utilized. But at the same time some combinations could increase your execution time, if they would be striving for the resources. As far as the combination is concerned, I would try different combinations (like one big 8 GB and some minor tables, one huge 20 GB index only etc) according to the environment, workload and then implement the most feasible combination. I would also try different combinations with different max degree of parallelism (MAXDOP). Cheers.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Raj More avatar image Raj More commented ·
My MAXDOP is currently set to 0 (zero). I'd love to understand a scientific way to tweak these settings rather than just playing around and not really understanding why.
0 Likes 0 ·
WilliamD avatar image WilliamD commented ·
@raj Moore - the whole thing about SQL Server and the settings you are talking about is that they are usage specific. You have to change the values to suit your system and data. Set, observe, repeat that is scientific too.
0 Likes 0 ·
TimothyAWiseman avatar image
TimothyAWiseman answered
If all of the tables you are talking about exist on a single physical harddrive, you are unlikely to see any benefit from running them in parallel. Indexing is an IO intensive task, but it is generally not processor or ram intensive (though there may be some limited exceptions, such as indexing a view which includes a requirement to derive complex results or indexing a computed column). However, as Usman said the best way to know is to test it, and you are likely to see some benefit if the tables in question did exist on multiple hard drives.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.