question

NeerajTripathi avatar image
NeerajTripathi asked

Composite Index B tree

How does search with composite index work? How B Tree for composite index work internally? Suppose, we created index as below. Create index Idx_Customer_id_Name on customer (id, name, email) If we run below statements then get seek. Select id,name,email from customer where id = '10' and name = 'Neeraj' Select id,name,email from customer where name = 'Neeraj' and id = '10'-- As mentioned below. Select id,name,email from customer where name = 'Neeraj' -- with additional cost in execution plan and sometime execution plan suggests missing index even after index seek. Please clarify if we create the suggested index even if query seeking the index and also explain how B tree works for composite index.
index
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
If any answer proves helpful, mark it by clicking on the thumbs up. If any answer provides a valid solution, mark that by clicking on the checkbox next to it. You can mark multiple answers as helpful and one as a solution.
0 Likes 0 ·
Usman Butt avatar image
Usman Butt answered
I guess you already know how B-Tree works. I am not getting into it as it is a huge subject and better would be to search google, wikipedia etc. As far as the search for the composite index is concerned, it is the same as the regular index search. The only difference is that composite index would have multi-value keys to search instead of a single key to search in an order. The sort order of the composite index would be in the column order specified at the creation of the index from left to right. So it is quite important that the order of the columns specified must be according to your data access pattern. In your case the index would be sorted firstly by column id, then by name and then by email. So for instance, if you have following values stored in the customer table id name email 2 Neeraj bla 10 Neeraj yzx 1 abc xyz 3 def zyx Then the order of the index could be id name email 1 abc xyz 2 Neeraj bla 3 def zyx 10 Neeraj yzx From the above, you can see that searching id = 10 and name = 'Neeraj' would be quite easy. Hence in that case, an index seek is performed. In case of name = 'Neeraj', it is evident that in order to find 'Neeraj', the traversing is not simple anymore. Hence, more IO cost would be required to find the required rows. This is why a missing index suggestion would be propmted as an index on the name column would be more beneficiary. As far as the suggested index creation is concerned, it really depends upon your data access patterns and environment. But noticing that id seems to be the primary key and you may already have the clustered index be defined on it. I would rather ignore the suggested index and alter the existing composite index. So the initial ordering would be on the name column and then on the id. Also, having email column in the composite index may not be the best choice. I would rather have it as an included column in the composite index. So you can try something like Create index Idx_Customer_id_Name on customer (name,id) INCLUDE (email) It is only a suggestion against the limited information provided.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

NeerajTripathi avatar image NeerajTripathi commented ·
Thanks Usman Sir, We are searching with both name and id and when we apply in clause then query optimizer won't suggest any index. Also, i know how B tree works but have confusion regarding composite index as it is organized by first column then second and so on. Hence, i am in confusion that is index kaap B tree for second column or use some other logic. Please explain or share any link that elaborate B tree for composite index.
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered
The index works of the keys you provide and the order of those keys determines the order of the storage of the index. So your index has id, name, email, in that order. That then determines the layout of the balanced tree. Everything will be ordered first by id, then by name, then by email. When writing t-sql, as you've seen, you don't always have to refer to all three of the columns to get use out of the index. But, you do have to refer to them in order, meaning, you must reference ID so that you can, if needed, reference name and then email (referring to where clauses and join clauses here). If you refer to name and/or email in a query, but no ID, then that index won't get used. This isn't because of the balanced tree though. It's because the statistics and specifically the histogram within the statistics. The histogram, which is one of the ways the optimizer determines if an index is useful, shows the data distribution of the first column of the index, in your example id. So, when you don't refer to id, but instead refer to name, your index isn't used because the optimizer realizes that it wouldn't be useful. All this makes sense when you think about the fact that the keys determine storage order. So if you search for name or email without referencing ID, then the index has to be scanned in order to find values. If you need to search exclusively for those other columns, you need one or more other indexes.
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Not seeing all your structures it's hard to know. But, the suggested indexes are just that, suggestions. It doesn't mean they're right. There are other things that affect this all. For example, if you have an identity column that is inserting rows, but your statistics haven't been updated, then the row most recently inserted won't appear on the statistics and the optmizer might think it needs a different index. Or, you could have functions on a column in the WHERE clause which will prevent proper index use entirely, so the optimizer won't suggest anything. Or, an IN clause can prevent index use, again, the optimizer won't suggest things. It really depends on the circumstances. You can use the execution plans to understand the choices the optimizer made.
1 Like 1 ·
NeerajTripathi avatar image NeerajTripathi commented ·
> Thanks for answering. > Grant Sir, I have quoted this question because when we apply filter like id = 10 and name = 'Neeraj' or vice versa and query optimizer suggesting index on (Name, id) as index key columns and strange thing was that when we search using in key word for name like Name in('neeraj','') then query optimizer doesn't suggest index. is is because of the data in the column in the stats or something else. Please explain.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Oh, and remember @Maverick, any answers that are helpful, click on the thumbs up next to them. Any answers that solve your problem, click on the check box next to it.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.