question

7alwagy avatar image
7alwagy asked

What should I do to improve this bottle neck?

Hello all, first of all, sorry for the non descriptive title, I'm just too rushed so I couldn't come up with a better one.
Second:
I have a portion of my database the looks like the following diagram:
You can an image of the data model here: "http://img682.imageshack.us/img682/1089/testdatamodel.png" as I'm not allowed to user image tag.
I have contributors on the system, each write to many sources, and a source can have many working contributors. Users can subscribe to as many contributors as they like and as many sources as they like. Now, what I want to do is simply retrieve all the articles for certain user. These articles are either coming through a contributor or a source the user subscribes to. To make it easy, when a user subscribes to a source I simply copy all the sources contributors to the users_contributors table. One tricky piece, when I retrieve the user's articles I retrieve all the articles that he his contributors write, and all the articles that were published in the sources he follows where those articles doesn't have a valid contributor on the system. (I.E contributorID is null).
I created the following query:

 Select Articles.ArticleID, Articles.ContributorId, Contributors.Name, 
    Sources.Name, Articles.ArticleTitle
From Articles 
    		Inner Join Contributors On Articles.ContributorId = Sources.SourceID
    		Inner Join Sources On Articles.SourceId = Sources.SourceID			
Where Articles.ContributorId in (
    Select ContributorId from Users_Contributors
    Where UserID = 3
    )
    OR (

    	Articles.SourceId in (
    		Select SourceId from Users_Sources
    		Where UserID = 3
    		)

    		and 
    		Articles.ContributorId is null
    )

The problem with the above query is that, it doesn't return any article with contributorID null. I understand this is because of the join on the contributors table. What should I do in such a case?
Should I consider denormalization?
What are the prober fields to index on each table for this query to run fast (Rowset returned are approximately 10000)?
I need to support paging on this query, will With { } clause be appropriate to me, or should I consider another strategy?
Thanks in advance.
Ps: I'm using SQL Server 2008

t-sqlperformance
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bob Hovious avatar image
Bob Hovious answered

To get articles with ContributorID = null, make the join to the Sources table a LEFT JOIN.

You can look at the query plan in 2008, and it will suggest indexes but here are some starting ideas.

Users_Contributors (UserID, ContributorID) -- if USERID is always going to be 3 -- consider a filtered index that only contains rows where USERID = 3.

Articles (UserID, SourceID, ContributorID) -- same comments about filtered index

Sources (SourceID, Name)

By the way, how is join at the bottom supposed to work? What column in the contributors table itself are you joining to? Assuming you meant Contributors.ContributorID instead of Sources.SourceID, then

Contributors (ContributorID, Name)

Never cut and past bits and pieces. You should always show the code that is actually running as a whole. A join to a table that references no column in that table makes no sense.

Inner Join CONTRIBUTORS On Articles.ContributorId = Sources.SourceID

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

dave ballantyne avatar image
dave ballantyne answered

In terms of the paging i would point you here.
Are you able to post the actual query plan for the query ?

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Gustavo avatar image
Gustavo answered

If you need all columns of ARTICLES matching or not CONTRIBUTORS, then use a LEFT JOIN instead of INNER JOIN. Use the same rule for ARTICLES if it applies.

The first performance tip i can give is to replace your IN() by EXISTS or even LEFT JOINS:

-- replacing IN() by LEFT JOINS
Select Articles.ArticleID, Articles.ContributorId, Contributors.Name, 
    Sources.Name, Articles.ArticleTitle
From Articles 
                LEFT JOIN Contributors On Articles.ContributorId = Sources.SourceID
                LEFT JOIN Sources On Articles.SourceId = Sources.SourceID                      
    			LEFT JOIN Users_Contributors On Articles.ContributorId = Users_Contributors.ContributorId AND Users_Contributors.UserID = 3
    			LEFT JOIN Users_Sources On Articles.SourceId = Users_Sources.SourceId AND Users_Sources.UserID = 3
Where   Users_Contributors.ContributorId IS NOT NULL
    	OR 
    	( Users_Sources.SourceId IS NOT NULL AND Articles.ContributorId is null )

-- replacing IN() by EXISTS
Select  Articles.ArticleID, Articles.ContributorId, Contributors.Name, 
    	Sources.Name, Articles.ArticleTitle
From    Articles 
                LEFT Join Contributors On Articles.ContributorId = Sources.SourceID
                LEFT Join Sources On Articles.SourceId = Sources.SourceID                      
Where EXISTS
    (
    Select top 1 1 from Users_Contributors
    Where UserID = 3 AND Articles.ContributorId = Users_Contributors.ContributorId
    )
    OR EXISTS ( Select top 1 1  from Users_Sources
                Where UserID = 3 AND Articles.SourceId = Users_Sources.SourceId
                )
                AND
                Articles.ContributorId is null
    )

Basic Indexes should be all ids with clustered index, nonclustered index on Users_Contributors( ContributorId, UserID) and Users_Sources( SourceId, UserID ).

For paging, i would suggest row_numer() function, its something like:

with aux(
    Select	row_number() over( order by <some order criteria> ) as id,
    .
    . <remaining select>
    .
)
select * from aux where ID between @start_value and @final_value

Let me know if this helps...

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.