question

sqlnewb avatar image
sqlnewb asked

Increase Query Efficency

I have this query but it is taking foreever to run. Any suggestions on if it can be changed to run faster? BTW tablaA has nearly 4 million records and tableB 40,000. I am thinking theres is nothing I can do but figured couldn;t hurt to ask. This is a simple example that resembles it: select count (distinct a1.ID) , a1.ID from tablea a1 JOIN tableb a2 ON a1.colA = a2.colA and a1.colB = a2.colB and a1.colC = a2.colC where a1.prodID IN ( 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19, 20) and (a1.YearMonth between 200810 and 200909) group by a1.ID
sqlt-sqlperformance
7 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

What indexes are in place on those tables? What's the range of data within the ID field? What's the distribution of that data? What are you trying to achieve?
1 Like 1 ·
If you are grouping by a1.ID then are you sure you need to select count(distinct a1.ID)? Also, there might be a type in the part of the predicate reading and (a1.YearMonth between 200810 and **20909**) The 20909 looks strange, maybe you meant it to be **200909** seeking to get one year worth of data. Since it looks like you are only getting back 20 rows from your query, it should not take long time at all unless there are missing indexes.
1 Like 1 ·
@sqlnewb : you deleted your own question again, this time it had comments. If you realised something was wrong, please just comment and do not delete
1 Like 1 ·
Trying to count the number of ids that use each product. I am not sure on what indexes are currrently in place.
0 Likes 0 ·
Oh, and should the a1.YearMonth field be between 200810 and 200909 rather than 200810 and 20909?
0 Likes 0 ·
Show more comments
Grant Fritchey avatar image
Grant Fritchey answered
I think indexing this could be a bear. You've got two different filter criteria, one of them a rather wide IN clause (which results in OR statements, which frequently cause scans) and the other a BETWEEN on a text field, again, possibly problematic. Then, on top of it, we get an aggregation against another column with a DISTINCT. Yikes. First thing I'd do is pivot the IN clause to a temp table and then use a JOIN. You're more likely to get a good execution plan that way. After that... I'd need to see the execution plan to understand how SQL Server is dealing with all the details. Look up Jeff Moden and Tally Table to find great articles on how to do the pivot.
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Sharma avatar image
Sharma answered
You can also try this ? CREATE TABLE #Prod(ProdID INT) INSERT INTO #Prod(ProdID) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20) SELECT DISTINCT a1.ID AS ID FROM tablea a1 (NOLOCK) INNER JOIN tableb a2 (NOLOCK) ON a1.colA = a2.colA and a1.colB = a2.colB and a1.colC = a2.colC WHERE EXISTS (SELECT 1 FROM #Prod P WHERE P.ProdID=A1.prodID) AND (a1.YearMonth BETWEEN 200810 AND 200909) DROP TABLE #Prod You do not need to count ID field because as per your query it will be always 1.
4 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I'm not crazy about the use of NOLOCK. You do know that it can lead to missing or extra rows depending on how page splits occur as you're reading?
2 Likes 2 ·
+1, though I agree with Fatherjack about indexing and joining rather than using subquery.
0 Likes 0 ·
I know about the advantage and disadvantage of using NOLOCK, but it's depends on requirement. so for every case you can not say that NOLOCK is not helpful.
0 Likes 0 ·
Magnus, in mine suggested query i used exists condition and sometime it is very fast instead of inner join also.
0 Likes 0 ·
Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
Agreeing with what all the others have said, I'd like to be a little more specific on the indexing. I'm guessing tablea.ID is the primary key, so that should already be indexed, probably clustered index. If not, create an index on that column. Apart from that, you should have indexes on colA, colB and colC in both tables. For this specific query, you'd probably do fine with one index containing all three columns, but if you want that or not should also be a consideration with the other queries running against the tables. tablea.ProdID should definitely be indexed. To find out which indexes you have, you could either expand Tables->tablea->Indexes in Management Studio, or run a query to find out: SELECT i.name as IndexName,i.index_id, i.type_desc, ic.index_column_id,c.name as ColumnName from sys.indexes i INNER JOIN sys.index_columns ic ON i.index_id = ic.index_id AND i.object_id = ic.object_id INNER JOIN sys.columns c ON ic.column_id = c.column_id AND ic.object_id = c.object_id WHERE i.object_id = OBJECT_ID('dbo.tablea') ORDER BY i.name, ic.index_column_id
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image
Fatherjack answered
I would suggest trying a temp table, as @sqlchamp has shown but I would index it and join on it, not use a nested select CREATE TABLE #Prod ( ProdID INT ) INSERT INTO #Prod ( ProdID ) VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 ),( 6 ),( 7 ),( 8 ),( 9 ),( 10 ), ( 11 ),( 12 ),( 13 ),( 14 ),( 15 ),( 16 ),( 17 ),( 18 ),( 19 ),( 20 ) CREATE CLUSTERED INDEX ix_ProdID ON [#Prod] ([ProdID]) SELECT DISTINCT a1.ID AS ID FROM tablea a1 INNER JOIN tableb a2 ON (a1.colA = a2.colA AND a1.colB = a2.colB AND a1.colC = a2.colC) INNER JOIN #Prod p ON p.ProdID = a1.prodID WHERE a1.YearMonth BETWEEN 200810 AND 200909 DROP TABLE #Prod You will also need to look at the execution plan, consider data types and indexing. Your `BETWEEN` may be less effective than you think
3 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

+1. What other construct would you use to replace the BETWEEN clause?
0 Likes 0 ·
Heh! I'd try >= and
0 Likes 0 ·
Is >= and = and
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.