I have this query but it is taking foreever to run. Any suggestions on if it can be changed to run faster? BTW tablaA has nearly 4 million records and tableB 40,000. I am thinking theres is nothing I can do but figured couldn;t hurt to ask. This is a simple example that resembles it: select count (distinct
a1.ID from tablea a1 JOIN tableb a2 ON a1.colA = a2.colA and a1.colB = a2.colB and a1.colC = a2.colC where a1.prodID IN ( 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19, 20) and (a1.YearMonth between 200810 and 200909) group by
I think indexing this could be a bear. You've got two different filter criteria, one of them a rather wide IN clause (which results in OR statements, which frequently cause scans) and the other a BETWEEN on a text field, again, possibly problematic. Then, on top of it, we get an aggregation against another column with a DISTINCT. Yikes. First thing I'd do is pivot the IN clause to a temp table and then use a JOIN. You're more likely to get a good execution plan that way. After that... I'd need to see the execution plan to understand how SQL Server is dealing with all the details. Look up Jeff Moden and Tally Table to find great articles on how to do the pivot.
You can also try this ? CREATE TABLE #Prod(ProdID INT) INSERT INTO #Prod(ProdID) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20) SELECT DISTINCT
a1.ID AS ID FROM tablea a1 (NOLOCK) INNER JOIN tableb a2 (NOLOCK) ON a1.colA = a2.colA and a1.colB = a2.colB and a1.colC = a2.colC WHERE EXISTS (SELECT 1 FROM #Prod P WHERE P.ProdID=A1.prodID) AND (a1.YearMonth BETWEEN 200810 AND 200909) DROP TABLE #Prod You do not need to count ID field because as per your query it will be always 1.
Agreeing with what all the others have said, I'd like to be a little more specific on the indexing. I'm guessing
tablea.ID is the primary key, so that should already be indexed, probably clustered index. If not, create an index on that column. Apart from that, you should have indexes on colA, colB and colC in both tables. For this specific query, you'd probably do fine with one index containing all three columns, but if you want that or not should also be a consideration with the other queries running against the tables. tablea.ProdID should definitely be indexed. To find out which indexes you have, you could either expand Tables->tablea->Indexes in Management Studio, or run a query to find out: SELECT i.name as IndexName,i.index_id, i.type_desc, ic.index_column_id,c.name as ColumnName from sys.indexes i INNER JOIN sys.index_columns ic ON i.index_id = ic.index_id AND i.object_id = ic.object_id INNER JOIN sys.columns c ON ic.column_id = c.column_id AND ic.object_id = c.object_id WHERE i.object_id = OBJECT_ID('dbo.tablea') ORDER BY i.name, ic.index_column_id
I would suggest trying a temp table, as @sqlchamp has shown but I would index it and join on it, not use a nested select CREATE TABLE #Prod ( ProdID INT ) INSERT INTO #Prod ( ProdID ) VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 ),( 6 ),( 7 ),( 8 ),( 9 ),( 10 ), ( 11 ),( 12 ),( 13 ),( 14 ),( 15 ),( 16 ),( 17 ),( 18 ),( 19 ),( 20 ) CREATE CLUSTERED INDEX ix_ProdID ON [#Prod] ([ProdID]) SELECT DISTINCT
a1.ID AS ID FROM tablea a1 INNER JOIN tableb a2 ON (a1.colA = a2.colA AND a1.colB = a2.colB AND a1.colC = a2.colC) INNER JOIN #Prod p ON p.ProdID = a1.prodID WHERE a1.YearMonth BETWEEN 200810 AND 200909 DROP TABLE #Prod You will also need to look at the execution plan, consider data types and indexing. Your `BETWEEN` may be less effective than you think