x

Using isnull() in the WHERE clause

Now I don't claim to be an expert (mainly because I'm not one) but I've just moved into a reporting role in a new organization and have noticed some peculiarities in the code. We report off a replication of the live system. Almost every table has a column called is_deleted. It is char(1) and just a soft delete flag. Valid values are 'y' and 'n'. Unfortunately sometimes NULL slips in there too, but I'm not permitted to do anything to change that. Less than 1% of rows are flagged as 'y'. All this means that every bit of SQL that has ever been written for reporting has this sort of thing in the WHERE clause:

and isnull(my_tbl_1.is_deleted, 'n') ='n'
and isnull(my_tbl_2.is_deleted, 'n') ='n'

...and so on. I was under the impression that isnull() was non sargable, so I wrote two identical queries, except that in the second on I wrote

 and (my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N')
 and (my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N')

to my surprise, they generated exactly the same execution plan. Neither had a performance advantage. Incidentally, in all the indexes I've looked at here, none of them mention the is_deleted column. Is it possible that when the vast bulk of the values in a columns are the same (in this case 'n') that indexing the column is a bit pointless?

If I had my way I'd report off an OLAP solution instead of the replicated OLTP db, and that way I could filter out all the pesky soft deleted rows before I even consider writing reports, but that's not an option just yet. Anyway I'd be interested to hear what y'all would do here.
more ▼

asked Aug 06 '10 at 03:29 PM in Default

GPO gravatar image

GPO
1.9k 32 35 39

(comments are locked)
10|1200 characters needed characters left

4 answers: sort newest

You are correct in that indexing the column is pointless as the values there doe not provide enough cardinality, which is a very important factor when considering adding a column to existing index.

Because of this, it does not matter that isnull is not SARGable, and you can change your

my_tbl_1.is_deleted is NULL or my_tbl_1.is_deleted = 'N'
-- to 
isnull(my_tbl_1.is_deleted, 'N') ='N'
Oleg
more ▼

answered Aug 06 '10 at 04:14 PM

Oleg gravatar image

Oleg
15.9k 2 4 24

Excellent answer. Just as an aside, isnull is T-SQL, coalesce is part of the SQL standard, if that sort of thing matters to you.
Aug 06 '10 at 05:02 PM TimothyAWiseman

@TimothyAWiseman They are just 2 different beasts, and this is not only because isnull takes 2 arguments while coalesce takes at least 2, but can handle many more. With 2 arguments, the difference is that isnull casts result to the first argument while coalesce to the one with highest precedence, i.e

declare @v varchar(3); -- it is null :)

select
    isnull(@v, 1234) from_isnull,
    coalesce(@v, 1234) from_coalesce;
go

-- the above shows this in result window

from_isnull from_coalesce
----------- -------------
*           1234

The above deserves a splaining: isnull tries to cast 1234 (int) to varchar(3), it does not fit, but implementation of int calls to handle it and just return *. coalesce just returns result as int because it is higher in the food chain.

-- this will error out (implicit conversion error)
select 
    isnull(59, getDate()) from_isnull;
go

-- this one will run just fine
select 
    coalesce(59, getDate()) from_coalesce;
go

from_coalesce
-----------------------
1900-03-01 00:00:00.000
Why March 1? 2 reasons: 59 is converted to datetime (higher precedence), and 1900 was not a leap year, thus 59 days from Jan 1 is Mar 1, not Feb 29 :)
Aug 06 '10 at 10:58 PM Oleg

@Oleg, thank you for pointing that out, I did not realize there was a difference. You have certainly enlightened me.

Of course, part of the reason I never noticed the difference is I have only ever used it in cases where all arguments had the same or virtually the same (slightly different length nvarchars for instance) datatype so it didn't make any difference, and I suspect that is true in most cases. I will still prefer the version in the SQL standard whenever the difference will not matter, but it is definitely nice to fully understand what the difference is.
Aug 07 '10 at 09:37 PM TimothyAWiseman
(comments are locked)
10|1200 characters needed characters left

"...without is_deleted in the index, the index is not covering, right?..." Yes, I'm pretty sure that's the case. Without is_deleted it's bombing out to a heap scan.

"...The real question is where to stick it?..." Man there is a lot to this business of testing! The philosophy in my new place of employment is "Who cares if the query takes an extra few seconds to run, just churn out the reports as fast as you can, there's no time for tweaking." Which explains why I'm looking into this on a Sunday - I dare not "waste" time on it at work. I'll fiddle around with the order of the items in the index and see what that reveals. This is a great forum for us learners, I have to say.
more ▼

answered Aug 07 '10 at 05:31 PM

GPO gravatar image

GPO
1.9k 32 35 39

No kidding about it being great for us learners... :-) I'm always learning stuff here, and I've been a SQL DBA for, oh, 15 years...
Aug 10 '10 at 06:04 AM ThomasRushton ♦
(comments are locked)
10|1200 characters needed characters left
org.org_nm is being returned from the dbo.organization table, not the evnt table. Org_id is the pk of the dbo.organization table and an fk in the dbo.evnt table. Org_id is in the join so a covering index does appear to work. Hope this explains things. Thanks for your continued interest it is very helpful.
more ▼

answered Aug 06 '10 at 11:10 PM

GPO gravatar image

GPO
1.9k 32 35 39

@GPO @Grant Fritchey In this case it appears that without is_deleted in the index, the index is not covering, right? If it is not then the non-clustered index inclusion depends strictly on the decision of whether this, combined with lookup to get that is_deleted value is cheaper than a clustered scan or not. Inclusion of the column if it does indeed complete making the index covering makes sense in this case. The real question is where to stick it? I would add it to the tail, but I saw that you added it to the head as a prefix column. In order to try to find out which way is better, I added Grant Fritchey (a.k.a "The Scary DBA") to this comment hoping that he will notice it and reply. Grant is an author of 2 incredible books (**SQL Server Execution Plans** and SQL Server 2008 Query Performance Tuning Distilled, an e-version of the first book available for download from redgate.com site). It would be really cool if he joins this discussion (not now of course as it is almost 3 AM in the Peoples Republic of Massachusetts :)
Aug 06 '10 at 11:49 PM Oleg
(comments are locked)
10|1200 characters needed characters left

Wait on... now I'm not so sure. That dreadful cop-out expression "It just depends..." is ringing in my ears. I decided to do a bit of reasonably real world testing. I created a test table that was identical to a table of events that I commonly use. I then created an index on each table with only one difference, you guessed it, the is_deleted flag was in one but not the other. Here are the indices:

CREATE UNIQUE NONCLUSTERED INDEX ix_evnt_test ON dbo.evnt_test 
(
    org_id,
    evt_start_dt,
    evt_end_dt,
    evt_id
)
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_evnt ON dbo.evnt 
(
  is_deleted,
    org_id,
    evt_start_dt,
    evt_end_dt,
    evt_id
)
GO

Now I ran the same query over both. It's the sort of query that crops up with reasonable regularity.

select
    org.org_nm,
    ev.start_dt,
    ev.end_dt
from 
  dbo.evnt ev
  join dbo.organization org 
    on org.org_id = ev.org_id
where
  ev.start_dt < '20070701'
  and
  ev.end_dt >= '20060701'
  and
  (ev.is_deleted = 'N' or ev.is_deleted is null);

The query analyser tells me that the query on the table with is_deleted in the index (and listed first) will only cosume 13% of the resources when run side by side with the same query on the other table (which theoretically chews up the other 87%). Now I've been told to consider the query optimiser results a "useful indication" rather than gospel, so I timed them and ran them repeatedly together. The query with is_deleted in the index was consistently around 15% faster. In practical terms in this instance it amounted to around a mere 50ms faster, which I'm always getting told not to care about. But as a general principal it would seem like there are situations where an index that includes a poorly differentiated column will improve your outcome, if nothing else it got me into an index seek and a merge join as opposed to an index scan and a hash match. The index seek had an estimated operator cost of 1.7... er... thingies, while the index scan had an estimated operator cost of 13.6... thingies. Anyway I don't think I'll solve this without more testing, but any feedback is very much appreciated.

GPO.
more ▼

answered Aug 06 '10 at 10:09 PM

GPO gravatar image

GPO
1.9k 32 35 39

@GPO Did you mean to write select org.org_id or org.org_nm is correct? If it is then what does the execution plan show? I cannot see how it can possibly use the first index at all with org.org_nm as it is not covered. The predicate does not include the prefix column of the first index (org_id) either. With the second index it is different. While the predicate lists ev.start_dt, ev.end_dt and then ev.is_deleted, all of them are and conditions, so the optimizer might be able to restate your where clause as

where
  (ev.is_deleted = 'N' or ev.is_deleted is null)
  and
  ev.start_dt = '20060701';
and bingo, see that the predicate starts from the prefix column, so it can use the index if it decides it makes sense to do so.
Aug 06 '10 at 10:43 PM Oleg
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x265
x241
x122
x29

asked: Aug 06 '10 at 03:29 PM

Seen: 9162 times

Last Updated: Aug 10 '10 at 05:34 AM