x

Increase Query Efficency

I have this query but it is taking foreever to run. Any suggestions on if it can be changed to run faster?

BTW tablaA has nearly 4 million records and tableB 40,000. I am thinking theres is nothing I can do but figured couldn;t hurt to ask.

This is a simple example that resembles it:

 select  count (distinct a1.ID) ,a1.ID 
 from tablea a1
 JOIN tableb a2
 ON  a1.colA   = a2.colA   
 and a1.colB   = a2.colB  
 and a1.colC = a2.colC 
 where  a1.prodID IN ( 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19, 20)
      and (a1.YearMonth between 200810 and 200909) 
 group by  a1.ID 
more ▼

asked Oct 19, 2011 at 12:12 PM in Default

avatar image

sqlnewb
216 29 32 35

What indexes are in place on those tables? What's the range of data within the ID field? What's the distribution of that data? What are you trying to achieve?

Oct 19, 2011 at 12:13 PM ThomasRushton ♦♦

Trying to count the number of ids that use each product. I am not sure on what indexes are currrently in place.

Oct 19, 2011 at 12:24 PM sqlnewb

Oh, and should the a1.YearMonth field be between 200810 and 200909 rather than 200810 and 20909?

Oct 19, 2011 at 12:38 PM ThomasRushton ♦♦

If you are grouping by a1.ID then are you sure you need to select count(distinct a1.ID)? Also, there might be a type in the part of the predicate reading

and (a1.YearMonth between 200810 and 20909)

The 20909 looks strange, maybe you meant it to be 200909 seeking to get one year worth of data.

Since it looks like you are only getting back 20 rows from your query, it should not take long time at all unless there are missing indexes.

Oct 19, 2011 at 12:54 PM Oleg

@sqlnewb : you deleted your own question again, this time it had comments. If you realised something was wrong, please just comment and do not delete

Oct 19, 2011 at 02:30 PM Kev Riley ♦♦
show all comments (comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

I think indexing this could be a bear. You've got two different filter criteria, one of them a rather wide IN clause (which results in OR statements, which frequently cause scans) and the other a BETWEEN on a text field, again, possibly problematic. Then, on top of it, we get an aggregation against another column with a DISTINCT. Yikes.

First thing I'd do is pivot the IN clause to a temp table and then use a JOIN. You're more likely to get a good execution plan that way. After that... I'd need to see the execution plan to understand how SQL Server is dealing with all the details. Look up Jeff Moden and Tally Table to find great articles on how to do the pivot.

more ▼

answered Oct 19, 2011 at 02:48 PM

avatar image

Grant Fritchey ♦♦
137k 20 43 81

(comments are locked)
10|1200 characters needed characters left

You can also try this ?

 CREATE TABLE #Prod(ProdID INT)

 INSERT INTO #Prod(ProdID) VALUES
 (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)

 SELECT  DISTINCT a1.ID AS ID
 FROM tablea a1 (NOLOCK)
 INNER JOIN tableb a2 (NOLOCK)
 ON  a1.colA   = a2.colA 
 and a1.colB   = a2.colB  
 and a1.colC =     a2.colC 
 WHERE  EXISTS (SELECT 1 
                  FROM #Prod P 
                  WHERE P.ProdID=A1.prodID)    
 AND (a1.YearMonth BETWEEN 200810 AND 200909)

 DROP TABLE #Prod

You do not need to count ID field because as per your query it will be always 1.

more ▼

answered Oct 19, 2011 at 10:58 PM

avatar image

Sharma
1.4k 90 93 96

+1, though I agree with Fatherjack about indexing and joining rather than using subquery.

Oct 20, 2011 at 01:08 AM Magnus Ahlkvist

I'm not crazy about the use of NOLOCK. You do know that it can lead to missing or extra rows depending on how page splits occur as you're reading?

Oct 20, 2011 at 03:34 AM Grant Fritchey ♦♦

I know about the advantage and disadvantage of using NOLOCK, but it's depends on requirement. so for every case you can not say that NOLOCK is not helpful.

Oct 20, 2011 at 05:12 AM Sharma

Magnus, in mine suggested query i used exists condition and sometime it is very fast instead of inner join also.

Oct 20, 2011 at 05:14 AM Sharma
(comments are locked)
10|1200 characters needed characters left

Agreeing with what all the others have said, I'd like to be a little more specific on the indexing. I'm guessing tablea.ID is the primary key, so that should already be indexed, probably clustered index.

If not, create an index on that column. Apart from that, you should have indexes on colA, colB and colC in both tables. For this specific query, you'd probably do fine with one index containing all three columns, but if you want that or not should also be a consideration with the other queries running against the tables.

tablea.ProdID should definitely be indexed.

To find out which indexes you have, you could either expand Tables->tablea->Indexes in Management Studio, or run a query to find out:

 SELECT i.name as IndexName,i.index_id, i.type_desc, ic.index_column_id,c.name as ColumnName
 from sys.indexes i
 INNER JOIN sys.index_columns ic
 ON i.index_id = ic.index_id AND i.object_id = ic.object_id 
 INNER JOIN sys.columns c
 ON ic.column_id = c.column_id AND ic.object_id = c.object_id 
 WHERE i.object_id = OBJECT_ID('dbo.tablea')
 ORDER BY i.name, ic.index_column_id 
more ▼

answered Oct 19, 2011 at 11:14 PM

avatar image

Magnus Ahlkvist
21.8k 20 41 42

(comments are locked)
10|1200 characters needed characters left

I would suggest trying a temp table, as @sqlchamp has shown but I would index it and join on it, not use a nested select

     CREATE TABLE #Prod ( ProdID INT )

  INSERT INTO #Prod
         ( ProdID )
  VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 ),( 6 ),( 7 ),( 8 ),( 9 ),( 10 ),
     ( 11 ),( 12 ),( 13 ),( 14 ),( 15 ),( 16 ),( 17 ),( 18 ),( 19 ),( 20 )

 CREATE CLUSTERED INDEX ix_ProdID ON [#Prod] ([ProdID])

  SELECT  DISTINCT
         a1.ID AS ID
  FROM   tablea a1 
         INNER JOIN tableb a2  ON (a1.colA = a2.colA
                                            AND a1.colB = a2.colB
                                            AND a1.colC = a2.colC)
         INNER JOIN #Prod p ON p.ProdID = a1.prodID 
  WHERE  a1.YearMonth BETWEEN 200810 AND 200909 

  DROP TABLE #Prod

You will also need to look at the execution plan, consider data types and indexing. Your BETWEEN may be less effective than you think

more ▼

answered Oct 20, 2011 at 01:03 AM

avatar image

Fatherjack ♦♦
43.8k 79 99 118

+1. What other construct would you use to replace the BETWEEN clause?

Oct 20, 2011 at 01:09 AM Magnus Ahlkvist

Heh! I'd try >= and

Oct 20, 2011 at 01:21 AM Fatherjack ♦♦

Is >= and

If I'd implement a BETWEEN clause, I'd do it using >= and

Oct 20, 2011 at 01:36 AM Magnus Ahlkvist
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1105
x1084
x304

asked: Oct 19, 2011 at 12:12 PM

Seen: 1039 times

Last Updated: Oct 20, 2011 at 01:19 AM

Copyright 2017 Redgate Software. Privacy Policy