question

Will 1 avatar image
Will 1 asked

Issue using the "in Sub query"

You got two tables, both are lists of employees identified by a number, I want to find every one in [MyCompany] that isn't in the [OtherCompany] table. Why does this work:-

select MyCompany.staffno  
from MyCompany  
where staffno not in (select OtherCompany.EmpNumber  
    from OtherCompany 
    where EmpNumber = MyCompany.staffno)  

But this doesn't:-

select MyCompany.staffno 
from MyCompany 
where staffno not in (select OtherCompany.EmpNumber from OtherCompany)

The only difference is the 'where' clause in the subquery, however the second query returns zero rows indicating that all rows in [MyCompany] already exist in the [OtherCompany] table which is not the case.

t-sqlquerysub-query
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered

Basically what's going on is that the optimizer is turning your NOT IN query into a join and then looking for null values. When you don't supply a filtering mechanism it's attempting to match on NULL and you can't actually do a match on NULL, so no values are returned because nothing matches.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

This is because one of the values from the query

select OtherCompany.EmpNumber from OtherCompany

Is returning NULL. When using IN clauses, NULL means 'match anything' - but it doesn't on a JOIN, this is why you're seeing the JOIN in your IN query return OK. This will probably work fine for you:

select MyCompany.staffno 
  from MyCompany 
 where staffno not in 
       (select EmpNumber 
          from OtherCompany 
         WHERE EmpNumber IS NOT NULL)
6 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
+1 for having a more complete answer. I still do these things using left join & looking for the NOT NULL, but in this case, the NOT IN is actually faster.
1 Like 1 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
+1 - to you and Grant - I would have gone for the LEFT JOIN option too. Interesting why the NOT IN is quicker ...
0 Likes 0 ·
Will 1 avatar image Will 1 commented ·
Thanks Matt, checking for Null on the EmpNumber made no differene, the query still returned zero rows even though there are rows in the [MyCompany] table that are not in the [OtherComapny] table.
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
@Will - do you want to post up some sample data / DDL? We can probably then help further.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
@Matt congrats on 9k. Well done. Only a little while to the big fat round number.
0 Likes 0 ·
Show more comments
Rob Farley avatar image
Rob Farley answered

Can I suggest you use NOT EXISTS instead of NOT IN, as this will handle NULLs much better.

select mc.staffno  
from MyCompany as mc
where NOT EXISTS (select * 
                  from OtherCompany as oc 
                  where oc.EmpNumber = mc.staffno
                 ) 
;
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Will 1 avatar image Will 1 commented ·
Hi Rob well yes, but I had tried that and it made no difference to the hadling of the null values. The key here was adding the 'where' clause (and therefore a refernce to the MyCompany outer table) to the sub-select. Don't ask me why though :-)
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.