|
Now I don't claim to be an expert (mainly because I'm not one) but I've just moved into a reporting role in a new organization and have noticed some peculiarities in the code. We report off a replication of the live system. Almost every table has a column called is_deleted. It is char(1) and just a soft delete flag. Valid values are 'y' and 'n'. Unfortunately sometimes NULL slips in there too, but I'm not permitted to do anything to change that. Less than 1% of rows are flagged as 'y'. All this means that every bit of SQL that has ever been written for reporting has this sort of thing in the WHERE clause: ...and so on. I was under the impression that isnull() was non sargable, so I wrote two identical queries, except that in the second on I wrote to my surprise, they generated exactly the same execution plan. Neither had a performance advantage. Incidentally, in all the indexes I've looked at here, none of them mention the is_deleted column. Is it possible that when the vast bulk of the values in a columns are the same (in this case 'n') that indexing the column is a bit pointless? If I had my way I'd report off an OLAP solution instead of the replicated OLTP db, and that way I could filter out all the pesky soft deleted rows before I even consider writing reports, but that's not an option just yet. Anyway I'd be interested to hear what y'all would do here.
(comments are locked)
|
|
You are correct in that indexing the column is pointless as the values there doe not provide enough cardinality, which is a very important factor when considering adding a column to existing index. Because of this, it does not matter that isnull is not SARGable, and you can change your Oleg Excellent answer. Just as an aside, isnull is T-SQL, coalesce is part of the SQL standard, if that sort of thing matters to you.
Aug 06 '10 at 05:02 PM
TimothyAWiseman
@TimothyAWiseman They are just 2 different beasts, and this is not only because isnull takes 2 arguments while coalesce takes at least 2, but can handle many more. With 2 arguments, the difference is that isnull casts result to the first argument while coalesce to the one with highest precedence, i.e declare @v varchar(3); -- it is null :)
select
isnull(@v, 1234) from_isnull,
coalesce(@v, 1234) from_coalesce;
go
-- the above shows this in result window
from_isnull from_coalesce
----------- -------------
* 1234The above deserves a splaining: isnull tries to cast 1234 (int) to varchar(3), it does not fit, but implementation of int calls to handle it and just return *. coalesce just returns result as int because it is higher in the food chain. -- this will error out (implicit conversion error)
select
isnull(59, getDate()) from_isnull;
go
-- this one will run just fine
select
coalesce(59, getDate()) from_coalesce;
go
from_coalesce
-----------------------
1900-03-01 00:00:00.000Why March 1? 2 reasons: 59 is converted to datetime (higher precedence), and 1900 was not a leap year, thus 59 days from Jan 1 is Mar 1, not Feb 29 :)
Aug 06 '10 at 10:58 PM
Oleg
@Oleg, thank you for pointing that out, I did not realize there was a difference. You have certainly enlightened me. Of course, part of the reason I never noticed the difference is I have only ever used it in cases where all arguments had the same or virtually the same (slightly different length nvarchars for instance) datatype so it didn't make any difference, and I suspect that is true in most cases. I will still prefer the version in the SQL standard whenever the difference will not matter, but it is definitely nice to fully understand what the difference is.
Aug 07 '10 at 09:37 PM
TimothyAWiseman
(comments are locked)
|
|
"...without "...The real question is where to stick it?..." Man there is a lot to this business of testing! The philosophy in my new place of employment is "Who cares if the query takes an extra few seconds to run, just churn out the reports as fast as you can, there's no time for tweaking." Which explains why I'm looking into this on a Sunday - I dare not "waste" time on it at work. I'll fiddle around with the order of the items in the index and see what that reveals. This is a great forum for us learners, I have to say. No kidding about it being great for us learners... :-) I'm always learning stuff here, and I've been a SQL DBA for, oh, 15 years...
Aug 10 '10 at 06:04 AM
ThomasRushton ♦
(comments are locked)
|
|
@GPO @Grant Fritchey In this case it appears that without is_deleted in the index, the index is not covering, right? If it is not then the non-clustered index inclusion depends strictly on the decision of whether this, combined with lookup to get that is_deleted value is cheaper than a clustered scan or not. Inclusion of the column if it does indeed complete making the index covering makes sense in this case. The real question is where to stick it? I would add it to the tail, but I saw that you added it to the head as a prefix column. In order to try to find out which way is better, I added Grant Fritchey (a.k.a "The Scary DBA") to this comment hoping that he will notice it and reply. Grant is an author of 2 incredible books (SQL Server Execution Plans and SQL Server 2008 Query Performance Tuning Distilled, an e-version of the first book available for download from redgate.com site). It would be really cool if he joins this discussion (not now of course as it is almost 3 AM in the Peoples Republic of Massachusetts :)
Aug 06 '10 at 11:49 PM
Oleg
(comments are locked)
|
|
Wait on... now I'm not so sure. That dreadful cop-out expression "It just depends..." is ringing in my ears. I decided to do a bit of reasonably real world testing. I created a test table that was identical to a table of events that I commonly use. I then created an index on each table with only one difference, you guessed it, the is_deleted flag was in one but not the other. Here are the indices: Now I ran the same query over both. It's the sort of query that crops up with reasonable regularity. The query analyser tells me that the query on the table with GPO. @GPO Did you mean to write select org.org_id or org.org_nm is correct? If it is then what does the execution plan show? I cannot see how it can possibly use the first index at all with org.org_nm as it is not covered. The predicate does not include the prefix column of the first index (org_id) either. With the second index it is different. While the predicate lists ev.start_dt, ev.end_dt and then ev.is_deleted, all of them are and conditions, so the optimizer might be able to restate your where clause as where (ev.is_deleted = 'N' or ev.is_deleted is null) and ev.start_dt = '20060701'; and bingo, see that the predicate starts from the prefix column, so it can use the index if it decides it makes sense to do so.
Aug 06 '10 at 10:43 PM
Oleg
(comments are locked)
|

