question

GPO avatar image
GPO asked

Using isnull() in the WHERE clause

Now I don't claim to be an expert (mainly because I'm not one) but I've just moved into a reporting role in a new organization and have noticed some peculiarities in the code. We report off a replication of the live system. Almost every table has a column called is_deleted. It is char(1) and just a soft delete flag. Valid values are 'y' and 'n'. Unfortunately sometimes NULL slips in there too, but I'm not permitted to do anything to change that. Less than 1% of rows are flagged as 'y'. All this means that every bit of SQL that has ever been written for reporting has this sort of thing in the WHERE clause: and isnull(my_tbl_1.is_deleted, 'n') ='n' and isnull(my_tbl_2.is_deleted, 'n') ='n' ...and so on. I was under the impression that isnull() was non sargable, so I wrote two identical queries, except that in the second on I wrote and (my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N') and (my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N') to my surprise, they generated exactly the same execution plan. Neither had a performance advantage. Incidentally, in all the indexes I've looked at here, none of them mention the is_deleted column. Is it possible that when the vast bulk of the values in a columns are the same (in this case 'n') that indexing the column is a bit pointless? If I had my way I'd report off an OLAP solution instead of the replicated OLTP db, and that way I could filter out all the pesky soft deleted rows before I even consider writing reports, but that's not an option just yet. Anyway I'd be interested to hear what y'all would do here.
tsqlperformanceindexingnull
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered
You are correct in that indexing the column is pointless as the values there doe not provide enough cardinality, which is a very important factor when considering adding a column to existing index. Because of this, it does not matter that isnull is not SARGable, and you can change your my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N' -- to isnull(my_tbl_1.is_deleted, 'N') ='N' Oleg
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@TimothyAWiseman They are just 2 different beasts, and this is not only because isnull takes 2 arguments while coalesce takes at least 2, but can handle many more. With 2 arguments, the difference is that isnull casts result to the first argument while coalesce to the one with highest precedence, i.e
declare @v varchar(3); -- it is null :)

select
    isnull(@v, 1234) from_isnull,
    coalesce(@v, 1234) from_coalesce;
go

-- the above shows this in result window

from_isnull from_coalesce
----------- -------------
*           1234
The above deserves a splaining: isnull tries to cast 1234 (int) to varchar(3), it does not fit, but implementation of int calls to handle it and just return \*. coalesce just returns result as int because it is higher in the food chain.
-- this will error out (implicit conversion error)
select 
    isnull(59, getDate()) from_isnull;
go

-- this one will run just fine
select 
    coalesce(59, getDate()) from_coalesce;
go

from_coalesce
-----------------------
1900-03-01 00:00:00.000
Why March 1? 2 reasons: 59 is converted to datetime (higher precedence), and 1900 was not a leap year, thus 59 days from Jan 1 is Mar 1, not Feb 29 :)
4 Likes 4 ·
TimothyAWiseman avatar image TimothyAWiseman commented ·
Excellent answer. Just as an aside, isnull is T-SQL, coalesce is part of the SQL standard, if that sort of thing matters to you.
0 Likes 0 ·
TimothyAWiseman avatar image TimothyAWiseman commented ·
@Oleg, thank you for pointing that out, I did not realize there was a difference. You have certainly enlightened me. Of course, part of the reason I never noticed the difference is I have only ever used it in cases where all arguments had the same or virtually the same (slightly different length nvarchars for instance) datatype so it didn't make any difference, and I suspect that is true in most cases. I will still prefer the version in the SQL standard whenever the difference will not matter, but it is definitely nice to fully understand what the difference is.
0 Likes 0 ·
GPO avatar image
GPO answered
Wait on... now I'm not so sure. That dreadful cop-out expression "It just depends..." is ringing in my ears. I decided to do a bit of reasonably real world testing. I created a test table that was identical to a table of events that I commonly use. I then created an index on each table with only one difference, you guessed it, the is_deleted flag was in one but not the other. Here are the indices: CREATE UNIQUE NONCLUSTERED INDEX ix_evnt_test ON dbo.evnt_test ( org_id, evt_start_dt, evt_end_dt, evt_id ) GO CREATE UNIQUE NONCLUSTERED INDEX ix_evnt ON dbo.evnt ( is_deleted, org_id, evt_start_dt, evt_end_dt, evt_id ) GO Now I ran the same query over both. It's the sort of query that crops up with reasonable regularity. select org.org_nm, ev.start_dt, ev.end_dt from dbo.evnt ev join dbo.organization org on org.org_id = ev.org_id where ev.start_dt = '20060701' and (ev.is_deleted = 'N' or ev.is_deleted is null); The query analyser tells me that the query on the table with `is_deleted` in the index (and listed first) will only cosume 13% of the resources when run side by side with the same query on the other table (which theoretically chews up the other 87%). Now I've been told to consider the query optimiser results a "useful indication" rather than gospel, so I timed them and ran them repeatedly together. The query with `is_deleted` in the index was consistently around 15% faster. In practical terms in this instance it amounted to around a mere 50ms faster, which I'm always getting told not to care about. But as a general principal it would seem like there are situations where an index that includes a poorly differentiated column will improve your outcome, if nothing else it got me into an index seek and a merge join as opposed to an index scan and a hash match. The index seek had an estimated operator cost of 1.7... er... thingies, while the index scan had an estimated operator cost of 13.6... thingies. Anyway I don't think I'll solve this without more testing, but any feedback is very much appreciated. GPO.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@GPO Did you mean to write **select org.org_id** or **org.org_nm** is correct? If it is then what does the execution plan show? I cannot see how it can possibly use the first index at all with **org.org_nm** as it is not covered. The predicate does not include the prefix column of the first index (org_id) either. With the second index it is different. While the predicate lists **ev.start_dt**, **ev.end_dt** and then **ev.is_deleted**, all of them are **and** conditions, so the optimizer might be able to restate your where clause as
where
  (ev.is_deleted = 'N' or ev.is_deleted is null)
  and
  ev.start_dt = '20060701';
and bingo, see that the predicate starts from the prefix column, so it can use the index if it decides it makes sense to do so.
0 Likes 0 ·
GPO avatar image
GPO answered
`org.org_nm` is being returned from the dbo.organization table, not the evnt table. `Org_id` is the pk of the dbo.organization table and an fk in the dbo.evnt table. Org_id is in the join so a covering index does appear to work. Hope this explains things. Thanks for your continued interest it is very helpful.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@GPO @Grant Fritchey In this case it appears that without is_deleted in the index, the index is not covering, right? If it is not then the non-clustered index inclusion depends strictly on the decision of whether this, combined with lookup to get that is_deleted value is cheaper than a clustered scan or not. Inclusion of the column if it does indeed complete making the index covering makes sense in this case. The real question is where to stick it? I would add it to the tail, but I saw that you added it to the head as a prefix column. In order to try to find out which way is better, I added Grant Fritchey (a.k.a "The Scary DBA") to this comment hoping that he will notice it and reply. Grant is an author of 2 incredible books (**SQL Server Execution Plans** and **SQL Server 2008 Query Performance Tuning Distilled**, an e-version of the first book available for download from redgate.com site). It would be really cool if he joins this discussion (not now of course as it is almost 3 AM in the Peoples Republic of Massachusetts :)
1 Like 1 ·
GPO avatar image
GPO answered
"...without `is_deleted` in the index, the index is not covering, right?..." Yes, I'm pretty sure that's the case. Without `is_deleted` it's bombing out to a heap scan. "...The real question is where to stick it?..." Man there is a lot to this business of testing! The philosophy in my new place of employment is "Who cares if the query takes an extra few seconds to run, just churn out the reports as fast as you can, there's no time for tweaking." Which explains why I'm looking into this on a Sunday - I dare not "waste" time on it at work. I'll fiddle around with the order of the items in the index and see what that reveals. This is a great forum for us learners, I have to say.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image ThomasRushton ♦♦ commented ·
No kidding about it being great for us learners... :-) I'm always learning stuff here, and I've been a SQL DBA for, oh, 15 years...
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.