x

Increase Query Efficency

I have this query but it is taking foreever to run. Any suggestions on if it can be changed to run faster?

BTW tablaA has nearly 4 million records and tableB 40,000. I am thinking theres is nothing I can do but figured couldn;t hurt to ask.

This is a simple example that resembles it:

select  count (distinct a1.ID) ,a1.ID 
from tablea a1
JOIN tableb a2
ON  a1.colA   = a2.colA   
and a1.colB   = a2.colB  
and a1.colC = a2.colC 
where  a1.prodID IN ( 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19, 20)
    and (a1.YearMonth between 200810 and 200909) 
group by  a1.ID 
more ▼

asked Oct 19 '11 at 12:12 PM in Default

sqlnewb gravatar image

sqlnewb
216 26 30 31

What indexes are in place on those tables? What's the range of data within the ID field? What's the distribution of that data? What are you trying to achieve?
Oct 19 '11 at 12:13 PM ThomasRushton ♦
Trying to count the number of ids that use each product. I am not sure on what indexes are currrently in place.
Oct 19 '11 at 12:24 PM sqlnewb
Oh, and should the a1.YearMonth field be between 200810 and 200909 rather than 200810 and 20909?
Oct 19 '11 at 12:38 PM ThomasRushton ♦

If you are grouping by a1.ID then are you sure you need to select count(distinct a1.ID)? Also, there might be a type in the part of the predicate reading

and (a1.YearMonth between 200810 and 20909)

The 20909 looks strange, maybe you meant it to be 200909 seeking to get one year worth of data.

Since it looks like you are only getting back 20 rows from your query, it should not take long time at all unless there are missing indexes.
Oct 19 '11 at 12:54 PM Oleg
@sqlnewb : you deleted your own question again, this time it had comments. If you realised something was wrong, please just comment and do not delete
Oct 19 '11 at 02:30 PM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left

4 answers: sort oldest

I think indexing this could be a bear. You've got two different filter criteria, one of them a rather wide IN clause (which results in OR statements, which frequently cause scans) and the other a BETWEEN on a text field, again, possibly problematic. Then, on top of it, we get an aggregation against another column with a DISTINCT. Yikes.

First thing I'd do is pivot the IN clause to a temp table and then use a JOIN. You're more likely to get a good execution plan that way. After that... I'd need to see the execution plan to understand how SQL Server is dealing with all the details. Look up Jeff Moden and Tally Table to find great articles on how to do the pivot.
more ▼

answered Oct 19 '11 at 02:48 PM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
90.7k 19 21 74

(comments are locked)
10|1200 characters needed characters left

You can also try this ?

CREATE TABLE #Prod(ProdID INT)

INSERT INTO #Prod(ProdID) VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)

SELECT  DISTINCT a1.ID AS ID
FROM tablea a1 (NOLOCK)
INNER JOIN tableb a2 (NOLOCK)
ON  a1.colA   = a2.colA 
and a1.colB   = a2.colB  
and a1.colC =     a2.colC 
WHERE  EXISTS (SELECT 1 
                 FROM #Prod P 
                 WHERE P.ProdID=A1.prodID)    
AND (a1.YearMonth BETWEEN 200810 AND 200909)

DROP TABLE #Prod
You do not need to count ID field because as per your query it will be always 1.
more ▼

answered Oct 19 '11 at 10:58 PM

Amardeep gravatar image

Amardeep
1.3k 84 88 89

+1, though I agree with Fatherjack about indexing and joining rather than using subquery.
Oct 20 '11 at 01:08 AM Magnus Ahlkvist
I'm not crazy about the use of NOLOCK. You do know that it can lead to missing or extra rows depending on how page splits occur as you're reading?
Oct 20 '11 at 03:34 AM Grant Fritchey ♦♦
I know about the advantage and disadvantage of using NOLOCK, but it's depends on requirement. so for every case you can not say that NOLOCK is not helpful.
Oct 20 '11 at 05:12 AM Amardeep
Magnus, in mine suggested query i used exists condition and sometime it is very fast instead of inner join also.
Oct 20 '11 at 05:14 AM Amardeep
(comments are locked)
10|1200 characters needed characters left

Agreeing with what all the others have said, I'd like to be a little more specific on the indexing. I'm guessing tablea.ID is the primary key, so that should already be indexed, probably clustered index.

If not, create an index on that column. Apart from that, you should have indexes on colA, colB and colC in both tables. For this specific query, you'd probably do fine with one index containing all three columns, but if you want that or not should also be a consideration with the other queries running against the tables.

tablea.ProdID should definitely be indexed.

To find out which indexes you have, you could either expand Tables->tablea->Indexes in Management Studio, or run a query to find out:

SELECT i.name as IndexName,i.index_id, i.type_desc, ic.index_column_id,c.name as ColumnName
from sys.indexes i
INNER JOIN sys.index_columns ic
ON i.index_id = ic.index_id AND i.object_id = ic.object_id 
INNER JOIN sys.columns c
ON ic.column_id = c.column_id AND ic.object_id = c.object_id 
WHERE i.object_id = OBJECT_ID('dbo.tablea')
ORDER BY i.name, ic.index_column_id
more ▼

answered Oct 19 '11 at 11:14 PM

Magnus Ahlkvist gravatar image

Magnus Ahlkvist
15.9k 15 19 32

(comments are locked)
10|1200 characters needed characters left

I would suggest trying a temp table, as @sqlchamp has shown but I would index it and join on it, not use a nested select

CREATE TABLE #Prod ( ProdID INT )

 INSERT INTO #Prod
       ( ProdID )
 VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 ),( 6 ),( 7 ),( 8 ),( 9 ),( 10 ),
    ( 11 ),( 12 ),( 13 ),( 14 ),( 15 ),( 16 ),( 17 ),( 18 ),( 19 ),( 20 )

CREATE CLUSTERED INDEX ix_ProdID ON [#Prod] ([ProdID])

 SELECT  DISTINCT
       a1.ID AS ID
 FROM   tablea a1 
       INNER JOIN tableb a2  ON (a1.colA = a2.colA
                              AND a1.colB = a2.colB
                              AND a1.colC = a2.colC)
       INNER JOIN #Prod p ON p.ProdID = a1.prodID 
 WHERE  a1.YearMonth BETWEEN 200810 AND 200909 

 DROP TABLE #Prod
You will also need to look at the execution plan, consider data types and indexing. Your BETWEEN may be less effective than you think
more ▼

answered Oct 20 '11 at 01:03 AM

Fatherjack gravatar image

Fatherjack ♦♦
41.2k 72 77 107

+1. What other construct would you use to replace the BETWEEN clause?
Oct 20 '11 at 01:09 AM Magnus Ahlkvist
Heh! I'd try >= and <= depending on data type - is it int/char/varchar is it indexed ... ... ... so many possibilities.!
Oct 20 '11 at 01:21 AM Fatherjack ♦♦

Is >= and <= performing better than BETWEEN?

If I'd implement a BETWEEN clause, I'd do it using >= and <=...
Oct 20 '11 at 01:36 AM Magnus Ahlkvist
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x977
x672
x241

asked: Oct 19 '11 at 12:12 PM

Seen: 699 times

Last Updated: Oct 20 '11 at 01:19 AM