select course,avg(pname) from (select course,count(pname) pname from studies group by course ) as sub group by course having count(pname)<avg(pname) is this correct
select course,avg(pname) from (select course,count(pname) pname from studies group by course ) as sub group by course having count(pname)<avg(pname) is this correct
@vk124_4 Your attempt is partially correct because you do need to calculate the counts first, so your sub part is OK. The outer part is incorrect though and causes the query to return nothing. Think about what happens when you select from the sub: the data is already aggregated to return 1 row per course, so the attempt to calculate average by grouping by course again is futile, there is nothing to average as each course is already represented by only one row. You need the average of those counts across all courses so there cannot be a group by course in the outer part. Please clarify whether you have SQL Server 2012 or newer or not. There are many ways to get the query working correctly, but 2012 or newer offers the possibility to come up with more elegant solution(s).
Since I feel like I would not mind doing some home work without getting the grade, here is the solution which will work in SQL Server 2012 or newer. The original logic in the query in question is partially correct, the count used in the sub-select is indeed needed. The outer part is incorrect because there can not be any group by in the outer select because the inner select already aggregated the courses so that there is only one row per course. The inner part can be modified to include not just the count of student per course but also the average of all such counts across all courses. This is possible via avg() over() which is available in 2012 or better. Here is the script:
select course, student_count, mean from ( select course, count(pname) student_count, avg(count(pname)) over() mean from studies group by course ) sub where student_count <= mean;
The only possible small problem with this one is the fact that the average returns the same data type as its argument and because the count returns whole numbers, the average calculation also returns whole number rounded down. For example, suppose there are 5 courses with counts of 5, 7, 6, 7 and 3 respectively. The average of this set might be expected to be 5.6 but the calculation returns 5, not 5.6 because of the rounding down to the whole number. This behaviour is by design and so, if there is a need to preserve the decimal part of the average calculation then its argument need to be converted to a suitable decimal before the outer aggregation kicks in. This will also allow to safely use the "less than" rather than "less than or equal" which is used in the script above. Here is the version with such conversion applied:
select course, student_count, cast(mean as decimal(10, 2)) mean from ( select course, count(pname) student_count, avg(cast(count(pname) as decimal(10, 2))) over() mean from studies group by course ) sub where student_count <= mean;
This technique shows that the engine is fully capable of aggregating the same data across different groups at the same time, i.e. avg(count) over() is possible, the count is calculated first based on the group by and the average is calculated after the counts have been already calculated.
Hope this helps.
Oleg
20 People are following this question.