question

Blackhawk-17 avatar image
Blackhawk-17 asked

Is This Multi-column Index Sufficient?

Seeder Question

You have a table, TableOne, with many columns.

After analysis you conclude that, based on WHERE and JOIN clauses:

  • 40% of the queries access Col1, Col2 and Col3
  • 25% of the queries access only Col1
  • 25% of the queries access only Col3
  • 10% of queries access variations of the above three (Col1 & Col3, Col2 & Col3, Col2, Col1 & Col2) and other columns not listed but existing in the table

You decide to create a single index on Col1, Col2, Col3.

Is this the only index you need on the table to satisfy the bulk of queries based on the above analysis or would others be beneficial? Why or why not?

indexingdatabase-designseeder-question
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered

Presented with the case as you have it, 65% of the queries would be satisfied by that single index. If you added one more index, only on the Col3 column you could satisfy 25% and be at 90%. I'd do that. With two indexes covering the majority of the queries, I'd assume, without further information, that I had adequately set up the system.

Now, the fact is, that remaining 10% might be the most frequently accessed queries because while you told us number of queries, you didn't suggest the frequency of their access. It's entirely possible that the remaining 10% consist of 90% of the calls. In that case, you'd need to completely rethink the design. But if you assume that the initial 90% also represents the number of calls, then the initial two indexes would be the right starting point.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

The single index would be usable for queries that access all three, the first two, or the first one. The trouble being that if you want to access the second column, then the index doesn't help a lot because it's not ordered by the second column - so it would still result in an index scan.

Imagine that you sorted your books on the shelf by colour, then by author. If you wanted to look for an author, then you would need to look in the right place in each of the coloured sections. In that example the colour would be Col1 and author would be Col2.

However, the single index would cover the majority of those (65% + some of the 10%). However, it wouldn't cover all of them.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jay Bonk avatar image
Jay Bonk answered

You would need to add additional indexes, based on the description of the proposed index and queries; it would be beneficial in the first, second, and some of the fourth cases, but not usable in the third case.

In order, for the index to be selected, my understanding is that the first node of the index needs to be used in the join or filtering criteria. Since you the queries in case three only use Col3, and that is not the first node of the index, it would not be usable for those queries (25%)

For the fourth example, the queries in which Col1 is included in the join or the filter, then the index could be used, for those where Col1 is not referenced, the index isn't viable

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered

It is also important to know how frequent the data changes too.

If the data is never to be updated or inserted again, then there's no issue (besides the initial creation cost) in creating multiple indexes covering every single permutation. However if the data changes regularly, the cost of maintaining these multiple indexes may be prohibitive.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bob Hovious avatar image
Bob Hovious answered

The single index would be beneficial for all the queries used above IF the query plans all called for scanning. The system could scan the index for those three columns (or for any one of the three) much faster than it could scan the table of many columns. (Fewer columns = smaller rows = more rows per page = fewer pages to scan.) A "covering index" is almost always of some benefit to queries, especially if they are returning long result sets (reports) rather than being used for quick lookups of a small number of rows.

But the critical element missing from your analysis is the ORDER BY sequence of the various queries. Remember that the primary way indexes speed up retrieval of data is by establishing a sequence different than that of the clustered index. Oftentimes, it makes sense to have multiple queries which contain the same columns, but have the columns ordered differently with respect to priority.

All of the following index schemes would serve different purposes, and should be created if there is enough demand. (Or even if there was little demand, but response time was critical when the demands were made.)

(OrderID, CustomerNo, OrderDate) -- get information for a particular order quickly

(OrderDate, CustomerNo, OrderID) -- get all orders for a particular date range

(CustomerNo, OrderDate, OrderId) -- get all orders for a particular customer

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered

One thing to keep in mind that "It all depends". Even if you create index for every single select, you may end up with table or index scan. Why? It all depends! It depends on the selectivity of the data. If col1, col2 and col3 are of bit datatype, there are only two possible values and the selectivity may be as much as 50%, and the index or table will be scanned anyway.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.