You have a table, TableOne, with many columns.
After analysis you conclude that, based on WHERE and JOIN clauses:
You decide to create a single index on Col1, Col2, Col3.
Is this the only index you need on the table to satisfy the bulk of queries based on the above analysis or would others be beneficial? Why or why not?
Presented with the case as you have it, 65% of the queries would be satisfied by that single index. If you added one more index, only on the Col3 column you could satisfy 25% and be at 90%. I'd do that. With two indexes covering the majority of the queries, I'd assume, without further information, that I had adequately set up the system.
Now, the fact is, that remaining 10% might be the most frequently accessed queries because while you told us number of queries, you didn't suggest the frequency of their access. It's entirely possible that the remaining 10% consist of 90% of the calls. In that case, you'd need to completely rethink the design. But if you assume that the initial 90% also represents the number of calls, then the initial two indexes would be the right starting point.
answered Oct 26, 2009 at 01:51 PM
Grant Fritchey ♦♦
The single index would be usable for queries that access all three, the first two, or the first one. The trouble being that if you want to access the second column, then the index doesn't help a lot because it's not ordered by the second column - so it would still result in an index scan.
Imagine that you sorted your books on the shelf by colour, then by author. If you wanted to look for an author, then you would need to look in the right place in each of the coloured sections. In that example the colour would be Col1 and author would be Col2.
However, the single index would cover the majority of those (65% + some of the 10%). However, it wouldn't cover all of them.
answered Oct 26, 2009 at 12:44 PM
Matt Whitfield ♦♦
You would need to add additional indexes, based on the description of the proposed index and queries; it would be beneficial in the first, second, and some of the fourth cases, but not usable in the third case.
In order, for the index to be selected, my understanding is that the first node of the index needs to be used in the join or filtering criteria. Since you the queries in case three only use Col3, and that is not the first node of the index, it would not be usable for those queries (25%)
For the fourth example, the queries in which Col1 is included in the join or the filter, then the index could be used, for those where Col1 is not referenced, the index isn't viable
answered Oct 26, 2009 at 12:48 PM
It is also important to know how frequent the data changes too.
If the data is never to be updated or inserted again, then there's no issue (besides the initial creation cost) in creating multiple indexes covering every single permutation. However if the data changes regularly, the cost of maintaining these multiple indexes may be prohibitive.
answered Oct 26, 2009 at 12:50 PM
Kev Riley ♦♦
The single index would be beneficial for all the queries used above IF the query plans all called for scanning. The system could scan the index for those three columns (or for any one of the three) much faster than it could scan the table of many columns. (Fewer columns = smaller rows = more rows per page = fewer pages to scan.) A "covering index" is almost always of some benefit to queries, especially if they are returning long result sets (reports) rather than being used for quick lookups of a small number of rows.
But the critical element missing from your analysis is the ORDER BY sequence of the various queries. Remember that the primary way indexes speed up retrieval of data is by establishing a sequence different than that of the clustered index. Oftentimes, it makes sense to have multiple queries which contain the same columns, but have the columns ordered differently with respect to priority.
All of the following index schemes would serve different purposes, and should be created if there is enough demand. (Or even if there was little demand, but response time was critical when the demands were made.)
(OrderID, CustomerNo, OrderDate) -- get information for a particular order quickly
(OrderDate, CustomerNo, OrderID) -- get all orders for a particular date range
(CustomerNo, OrderDate, OrderId) -- get all orders for a particular customer