x

Optimize nested/iterative query/queries

I have a rather long and complex query I need to optimize if at all possible. The problem I'm running into is that I need to set two constants (an count of a column in a derived table and an average of the same column in a derived table) and then use them to arithmatically derive a column of values in a third query.

Here's the actual query I'm using ... any tips for cleaning it up?

DECLARE @avrating Decimal(38,3); DECLARE @avraters Decimal(38,3);

SET @avrating = Cast((SELECT Avg(Agg.Rating) FROM (SELECT s.id AS id, Rating = AVG(r.calculated_rating) FROM shows AS s INNER JOIN listeners_shows_ratings AS r ON s.id = r.show_id WHERE s.selectable = 'True' GROUP BY s.id) Agg) As Decimal(38,3))

SET @avraters = Cast((SELECT Avg(Agg.Raters) FROM (SELECT s.id AS id, Raters = Cast(COUNT(r.calculated_rating) As Decimal) FROM shows AS s INNER JOIN listeners_shows_ratings AS r ON s.id = r.show_id WHERE s.selectable = 'True' GROUP BY s.id) Agg) As Decimal(38,3))

SELECT s.id, s.name, s.artwork_file_url, (((@avraters * @avrating ) + (AVG(r.calculated_rating)*CAST(COUNT(r.calculated_rating) AS Decimal)))/(@avraters + Cast(Count(r.calculated_rating) As Decimal))) As 'Rating' FROM shows As s INNER JOIN shows_characteristics AS sc ON s.id = sc.show_id INNER JOIN listeners_shows AS ls ON ls.show_id = s.id INNER JOIN listeners_shows_ratings AS r ON s.id = r.show_id WHERE(sc.characteristic_id = 4 OR sc.characteristic_id = 451 OR sc.characteristic_id = 458 AND s.selectable=1) GROUP BY s.id, s.name, s.artwork_file_url ORDER BY 'Rating' DESC

more ▼

asked Oct 20, 2009 at 02:19 PM in Default

avatar image

Eric Mann
59 2 3 6

(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

I believe the first 2 sets can be combined into a single SELECT.

DECLARE @avrating Decimal(38,3); DECLARE @avraters Decimal(38,3);

SELECT @avrating = CAST(Avg(Agg.Rating) AS decimal(38,3))
    ,@avraters = CAST(Avg(Agg.Raters) AS decimal(38,3)
FROM
(
    SELECT s.id AS id, Cast(COUNT(r.calculated_rating) As Decimal) AS Raters, AVG(r.calculated_rating) AS Rating
    FROM shows s
        INNER JOIN listeners_shows_ratings r
        ON s.id = r.show_id 
    WHERE s.selectable = 'True' 
    GROUP BY s.id
) Agg

Beyond that, I'm not entirely sure I understand the purpose of the code. I reformatted your query trying to figure it out. I agree with Jesse's comment about the AND/OR in the WHERE clause. My main question, though, is this: Why are you calculating the overall average rating and average number of ratings for any given show and then adding that to the calculated rating for a given show?

Here's the reformatted code:

SELECT s.id, s.name, s.artwork_file_url,
    (
        (
            (@avraters * @avrating ) +
            (
                AVG(r.calculated_rating) * CAST(COUNT(r.calculated_rating) AS Decimal)
            )
        )
        /
        (
            @avraters + Cast(Count(r.calculated_rating) As Decimal)
        )
    ) As 'Rating'
FROM shows s 
    INNER JOIN shows_characteristics sc ON s.id = sc.show_id 
    INNER JOIN listeners_shows ls ON ls.show_id = s.id
    INNER JOIN listeners_shows_ratings r ON s.id = r.show_id 
WHERE(sc.characteristic_id = 4 OR sc.characteristic_id = 451 OR sc.characteristic_id = 458 AND s.selectable=1)
GROUP BY s.id, s.name, s.artwork_file_url 
ORDER BY 'Rating' DESC

Essentially, this query will shift the ratings for individual shows to be closer to the overall average. Is this really what you want it to do?

more ▼

answered Oct 20, 2009 at 04:08 PM

avatar image

Tom Staab ♦
14.5k 7 14 18

(comments are locked)
10|1200 characters needed characters left

Jesse is right, I did want to use this instead:

WHERE s.selectable=1 AND sc.characteristic_id IN (4, 451, 458)

Just that one change has cleaned things up significantly, and I'll try to use Tom's single SELECT method for the first set of queries.

To answer your question, Tom, yes, the behavior of shifting ratings for individual shows closer to the average is what I expect and want to do. This method of working with ratings is meant to make shows with higher numbers of ratings balance with the shows that have just a handful of ratings (The ratings of a show with 1,000,000 fans don't overly dwarf a show with 2 fans).

It's actually called a "Bayesian Rating" and it used by organizations like Netflix as well as my client.

Does that bring what I'm trying to do into a better light?

more ▼

answered Oct 20, 2009 at 08:29 PM

avatar image

Eric Mann
59 2 3 6

(comments are locked)
10|1200 characters needed characters left

From the original query, this line looks suspicious:

WHERE(sc.characteristic_id = 4 OR sc.characteristic_id = 451 
OR sc.characteristic_id = 458 AND s.selectable=1)

Do you really mean this?:

WHERE (sc.characteristic_id = 4 OR sc.characteristic_id = 451 
OR sc.characteristic_id = 458) AND s.selectable=1

If so, you'll have to change it, since the logic is very different. Also, if you need to change it, you could shorten it to:

WHERE s.selectable=1 AND sc.characteristic_id IN (4, 451, 458)

If your original logic is as intended, then I recommend using parentheses to separate the ANDs and ORs (even if not needed, they help for readability):

WHERE (sc.characteristic_id = 4 OR sc.characteristic_id = 451 
OR (sc.characteristic_id = 458 AND s.selectable=1))

Cheers, Jesse

more ▼

answered Oct 20, 2009 at 03:41 PM

avatar image

Jesse McLain
117 2 1

(comments are locked)
10|1200 characters needed characters left

Based on my understanding of what you are trying to do I'd probably start by doing something like this:

;WITH cteAverages (id INT, avg_rating DECIMAL(38,3), avg_raters DECIMAL(38, 3)) AS
    (
    SELECT
        Avg(Agg.Raters),
        Avg(Agg.Rating)
    FROM
    (
     SELECT
        s.id AS id,
        Raters = Cast(COUNT(r.calculated_rating) As Decimal)m
        Rating = AVG(r.calculated_rating)
     FROM
        shows AS s INNER JOIN
        listeners_shows_ratings AS r
            ON s.id = r.show_id
     WHERE
        s.selectable = 'True'
     GROUP BY
        s.id
SELECT
    s.id,
    s.name,
    s.artwork_file_url,
    (((CA.avg_raters * CA.avg_rating) + (AVG(r.calculated_rating) * CAST(COUNT(r.calculated_rating) AS Decimal))) / (CA.avg_raters + Cast(Count(r.calculated_rating) As Decimal))) As 'Rating'
FROM
    shows As s INNER JOIN 
    shows_characteristics AS sc
        ON s.id = sc.show_id INNER JOIN 
    listeners_shows AS ls
        ON ls.show_id = s.id INNER JOIN
    listeners_shows_ratings AS r
        ON s.id = r.show_id INNER JOIN 
    cteAverages CA 
        ON s.id = CA.id
WHERE
    (sc.characteristic_id = 4 OR
     sc.characteristic_id = 451 OR
     sc.characteristic_id = 458 AND
     s.selectable = 1)
GROUP BY
    s.id,
    s.name,
    s.artwork_file_url
ORDER BY
    'Rating' DESC
more ▼

answered Oct 20, 2009 at 03:01 PM

avatar image

Jack Corbett
1.1k 3 4 7

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1066
x408
x45

asked: Oct 20, 2009 at 02:19 PM

Seen: 3438 times

Last Updated: Sep 28, 2012 at 03:56 AM

Copyright 2016 Redgate Software. Privacy Policy