question

Jim V avatar image
Jim V asked

Best indexing to use here?

If I have two fields that I will use as a "key" and these fields have not been concatenated into an actual field, what is the best indexing to use to get the following query to return results quickly? select [Field 2], count(distinct [Field 1]), sum([Field 3]) from Table_1 where [Field 1]+[Field 2] not in (select distinct [Field 1]+[Field 2] from Table_2 where [Field 1] is not NULL and [Field 2] is not NULL) group by [Field 2] order by 1
indexingconcatenation
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Oleg avatar image
Oleg answered
You could probably benefit from: - nonclustered index on [Field 1] asc, [Field 2] asc on Table\_1 - nonclustered index on [Field 2] asc on Table\_1 (this is OK, because the previous index is useless for [Field 2] as the [Field 2] is not its prefix - nonclustered index on [Field 1] asc, [Field 2] asc on Table\_2 Also, I would probably consider restating the query such that the **NOT IN** thingy is avoided, i.e. select a.[Field 2], count(distinct a.[Field 1]), sum(a.[Field 3]) from from Table_1 a left join ( select [Field 1] + [Field 2] as Both from Table_2 where [Field 1] is not null -- no need to worry about [Field 2] group by [Field 1], [Field 2] ) b on a.[Field 1] + a.[Field 2] = b.Both where b.Both is null -- only consider the non-matching records group by a.[Field 2] order by 1; Oleg
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

As always, it depends :) The more columns are included in the index, the bigger the total size of the index pages, but if there are many queries and procs which have [SARGable predicates][1] including more than one column then the benefit of having a multi-column index might be worth more than the size increase. In your example, the select from Table_2 definitely benefits from the index on both columns because if the index is only on [Field 1] then such index cannot be really used by the query which considers majority of the table's records because the non-clustered index scan plus a bookmark lookup to the cluster to figure the value of [Field 2] will be more expensive than a clustered index scan which is essentially a table scan. That is unless the [Field 1] is the only column of a **clustered** index which I don't believe is the case. With the index on both it is pretty easy for the engine to scan through the index pages only once and include a record to select every time the index record value changes (group by). [1]: http://www.sql-server-performance.com/2007/t-sql-where/
2 Likes 2 ·
Thanks much! The alternative to the dreaded NOT IN makes sense. I forgot about that trick. One thing I have never understood is when to make single field indexes versus multiple field indexes. Any insight on that would be appreciated!
0 Likes 0 ·
Thanks for the feedback - much appreciated. Have a good weekend.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.