question

leo2015 avatar image
leo2015 asked

Queries returning different results

Hi, if I run below query, I get 901 records: select * from Table T1 where COLUMN C1 in (select COLUMN C1 FROM Table T2 where Column C2 = 0) but if I run another query as below, I get 900 records: SELECT T1.* FROM Table T1 INNER JOIN Table T2 ON T2.Column C1 = T1.Column C1 WHERE T2.Column C2 = 0; I also tried to find out if there are any duplicates in T1 but there is none. Can someone please help me to get why the result is different. Thanks,
sql-server-2008sql-server-2005sql
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Tom Staab avatar image
Tom Staab answered
Your first query filters based on every C1 value in T2 where C2=0. Your second query has the additional restriction that UN_Node must match between the 2 tables. Check for a row in T2 where C2=0, but there is no matching T1 record based on UN_Node. WITH MatchingC1 AS ( SELECT T1.C1 FROM T1 INNER JOIN T2 ON T2.UN_Node = T1.UN_Node WHERE T2.C2 = 0 ) SELECT C1 FROM T2 WHERE C2 = 0 AND NOT EXISTS (SELECT 1 FROM MatchingC1 T1 WHERE T1.C1 = T2.C1) ; --- edit after additional information provided via comments ---
I would have expected the opposite of what you said happened. Duplicates in T2 would have resulted in duplicates in the INNER JOIN result, but that would not have happened with the IN approach. Therefore, the JOIN result would be greater than the IN result, but you said the opposite occurred. As a side note, since T2 is much larger than T1, the IN approach could be much less efficient (depending on indexes and other filters) as it requires a list of every entry in T2 to provide in that IN clause. With the JOIN approach, the engine could decide it's more efficient to first get the T1 data and then just check T2 records where the column values match.
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

leo2015 avatar image leo2015 commented ·
Many thanks for the answer. Just to add that, Column C1 is UN_Node, will that make any difference to the query you have shown...
0 Likes 0 ·
Tom Staab avatar image Tom Staab ♦ commented ·
In that case, are you sure the original question is worded correctly? Specifically, are you sure you get 1 more row with the IN than you do with INNER JOIN? I'm asking because the exact opposite can happen if you have a duplicate UN_Node in T2.
0 Likes 0 ·
Tom Staab avatar image Tom Staab ♦ commented ·
Can you post the actual statements? If you need to hide proprietary table or column names, you can make up the names, but please make sure you make the same changes to both queries. Thanks.
0 Likes 0 ·
leo2015 avatar image leo2015 commented ·
I did make the changes but just forgot to change one column name and now I have made the necessary changes to my question as well. I just posted the actual statements and only changed the names. Just to make you aware, T1 has only 13000 records but T2 has more than 100k with lots of duplicates but I believe the way I am joining the tables, query result should not be affected due to the duplicates in T2 however, please correct me if I am wrong.. I also tried to find out any duplicates with the query (SELECT * FROM T1 WHERE Column C1 IN (SELECT Column C1 FROM Table T1 GROUP BY Column C1 HAVING COUNT(*) > 1)) but I could not find any in T1 but as I said there are lots of duplicates in T2 though. Thanks
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.