Now I don't claim to be an expert (mainly because I'm not one) but I've just moved into a reporting role in a new organization and have noticed some peculiarities in the code. We report off a replication of the live system. Almost every table has a column called is_deleted. It is char(1) and just a soft delete flag. Valid values are 'y' and 'n'. Unfortunately sometimes NULL slips in there too, but I'm not permitted to do anything to change that. Less than 1% of rows are flagged as 'y'. All this means that every bit of SQL that has ever been written for reporting has this sort of thing in the WHERE clause: and isnull(my_tbl_1.is_deleted, 'n') ='n' and isnull(my_tbl_2.is_deleted, 'n') ='n' ...and so on. I was under the impression that isnull() was non sargable, so I wrote two identical queries, except that in the second on I wrote and (my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N') and (my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N') to my surprise, they generated exactly the same execution plan. Neither had a performance advantage. Incidentally, in all the indexes I've looked at here, none of them mention the is_deleted column. Is it possible that when the vast bulk of the values in a columns are the same (in this case 'n') that indexing the column is a bit pointless? If I had my way I'd report off an OLAP solution instead of the replicated OLTP db, and that way I could filter out all the pesky soft deleted rows before I even consider writing reports, but that's not an option just yet. Anyway I'd be interested to hear what y'all would do here.
You are correct in that indexing the column is pointless as the values there doe not provide enough cardinality, which is a very important factor when considering adding a column to existing index. Because of this, it does not matter that isnull is not SARGable, and you can change your my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N' -- to isnull(my_tbl_1.is_deleted, 'N') ='N' Oleg
Wait on... now I'm not so sure. That dreadful cop-out expression "It just depends..." is ringing in my ears. I decided to do a bit of reasonably real world testing. I created a test table that was identical to a table of events that I commonly use. I then created an index on each table with only one difference, you guessed it, the is_deleted flag was in one but not the other. Here are the indices: CREATE UNIQUE NONCLUSTERED INDEX ix_evnt_test ON dbo.evnt_test ( org_id, evt_start_dt, evt_end_dt, evt_id ) GO CREATE UNIQUE NONCLUSTERED INDEX ix_evnt ON dbo.evnt ( is_deleted, org_id, evt_start_dt, evt_end_dt, evt_id ) GO Now I ran the same query over both. It's the sort of query that crops up with reasonable regularity. select org.org_nm, ev.start_dt, ev.end_dt from dbo.evnt ev join dbo.organization org on org.org_id = ev.org_id where ev.start_dt = '20060701' and (ev.is_deleted = 'N' or ev.is_deleted is null); The query analyser tells me that the query on the table with `is_deleted` in the index (and listed first) will only cosume 13% of the resources when run side by side with the same query on the other table (which theoretically chews up the other 87%). Now I've been told to consider the query optimiser results a "useful indication" rather than gospel, so I timed them and ran them repeatedly together. The query with `is_deleted` in the index was consistently around 15% faster. In practical terms in this instance it amounted to around a mere 50ms faster, which I'm always getting told not to care about. But as a general principal it would seem like there are situations where an index that includes a poorly differentiated column will improve your outcome, if nothing else it got me into an index seek and a merge join as opposed to an index scan and a hash match. The index seek had an estimated operator cost of 1.7... er... thingies, while the index scan had an estimated operator cost of 13.6... thingies. Anyway I don't think I'll solve this without more testing, but any feedback is very much appreciated. GPO.
`org.org_nm` is being returned from the dbo.organization table, not the evnt table. `Org_id` is the pk of the dbo.organization table and an fk in the dbo.evnt table. Org_id is in the join so a covering index does appear to work. Hope this explains things. Thanks for your continued interest it is very helpful.
"...without `is_deleted` in the index, the index is not covering, right?..." Yes, I'm pretty sure that's the case. Without `is_deleted` it's bombing out to a heap scan. "...The real question is where to stick it?..." Man there is a lot to this business of testing! The philosophy in my new place of employment is "Who cares if the query takes an extra few seconds to run, just churn out the reports as fast as you can, there's no time for tweaking." Which explains why I'm looking into this on a Sunday - I dare not "waste" time on it at work. I'll fiddle around with the order of the items in the index and see what that reveals. This is a great forum for us learners, I have to say.