question

vk124_4 avatar image
vk124_4 asked

Which course has the below average number of students

select course,avg(pname) from (select course,count(pname) pname from studies group by course ) as sub group by course having count(pname)<avg(pname) is this correct

sql-server
2 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

What happens when you try it?

0 Likes 0 ·

@vk124_4 Your attempt is partially correct because you do need to calculate the counts first, so your sub part is OK. The outer part is incorrect though and causes the query to return nothing. Think about what happens when you select from the sub: the data is already aggregated to return 1 row per course, so the attempt to calculate average by grouping by course again is futile, there is nothing to average as each course is already represented by only one row. You need the average of those counts across all courses so there cannot be a group by course in the outer part. Please clarify whether you have SQL Server 2012 or newer or not. There are many ways to get the query working correctly, but 2012 or newer offers the possibility to come up with more elegant solution(s).

0 Likes 0 ·

1 Answer

· Write an Answer
Oleg avatar image
Oleg answered

Since I feel like I would not mind doing some home work without getting the grade, here is the solution which will work in SQL Server 2012 or newer. The original logic in the query in question is partially correct, the count used in the sub-select is indeed needed. The outer part is incorrect because there can not be any group by in the outer select because the inner select already aggregated the courses so that there is only one row per course. The inner part can be modified to include not just the count of student per course but also the average of all such counts across all courses. This is possible via avg() over() which is available in 2012 or better. Here is the script:

select 
    course, student_count, mean
    from (
        select 
            course, count(pname) student_count, avg(count(pname)) over() mean 
            from studies group by course
    ) sub
    where student_count <= mean;

The only possible small problem with this one is the fact that the average returns the same data type as its argument and because the count returns whole numbers, the average calculation also returns whole number rounded down. For example, suppose there are 5 courses with counts of 5, 7, 6, 7 and 3 respectively. The average of this set might be expected to be 5.6 but the calculation returns 5, not 5.6 because of the rounding down to the whole number. This behaviour is by design and so, if there is a need to preserve the decimal part of the average calculation then its argument need to be converted to a suitable decimal before the outer aggregation kicks in. This will also allow to safely use the "less than" rather than "less than or equal" which is used in the script above. Here is the version with such conversion applied:

select 
    course, student_count, cast(mean as decimal(10, 2)) mean
    from (
        select 
            course, count(pname) student_count, 
            avg(cast(count(pname) as decimal(10, 2))) over() mean 
            from studies group by course
    ) sub
    where student_count <= mean;

This technique shows that the engine is fully capable of aggregating the same data across different groups at the same time, i.e. avg(count) over() is possible, the count is calculated first based on the group by and the average is calculated after the counts have been already calculated.

Hope this helps.

Oleg

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.