Hi, I'm using SQL Server 2008. I have a query that looks like: SELECT TOP 8 Table1.Key FROM Table1 WHERE Table1.Key NOT IN (SELECT DISTINCT Table2.Key FROM Table2) When I run the query without the 'TOP N' it returns 86 rows from our Table1 in less than 1 second. When I add the 'TOP N' into the query, it runs for minutes - it's been gone for as long as 4 minutes - without returning a result. Strangely, when I do: SELECT TOP 8 * FROM Table1 (or) SELECT TOP 8 * FROM Table2 The top 8 rows are returned from either table in less than 1 second. Is there some sort of problem with SQL Server? Is there a problem with my T-SQL? Is there something else going on "beneath the surface"? Any answers/ideas would be greatly appreciated.
Right out of the gate, take a look at the execution plans for the various queries. Also, a TOP without an ORDER BY is going to return random rows. Also, a NOT IN with a DISTINCT could be replaced with a LEFT OUTER JOIN and a WHERE Table2.Key IS NULL to eliminate the bad records. The DISTINCT is an aggregate operation that's just going to slow down the process. But, to really know for sure what's happening, you have to check out the execution plans.
This behaviour is rather strange. The answer depends on how the data is indexed in both tables. You can try restating your query to any one of the following two (the result should be the same, but the time might differ, so you can pick whichever seems to be the best): select top 8 t1.[Key] from Table1 t1 where not exists ( select [Key] from Table2 where [Key] = t1.[Key] ); select top 8 t1.[Key] from Table1 t1 d left join Table2 t2 on t1.[Key] = t2.[Key] where t2.[Key] is null; Oleg