question

redtux1 avatar image
redtux1 asked

Group By plus max/min

Hi I have come across a weird issue regarding group by with min/max affecting the total number of records Fields are user_id which is an int Login which is a datetime field query 1 group by on user using where query for year select count(*) --,year(min_date) from ( select user_id--,year(first) year--,min(cast(first as date)) min_date from @nocareers where first between '2013-01-01 00:00:01' and '2014-01-01 00:00:00' group by user_id ) a which gives 274252 however grouping by the year gives a smaller number for , in this case 2013 select year,count(*) --,year(min_date) from ( select user_id,year(min(first)) year--,min(cast(first as date)) min_date from @nocareers -- where first between '2013-01-01 00:00:01' and '2014-01-01 00:00:00' group by user_id--,year(min(first)) ) a group by year gives 257591 I cant understand why an aggregate inside the group by would change the totals
group-byaggregates
3 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Can you repost the code samples using the code sample button please? I'm struggling to understand what is commented and what isn't.
0 Likes 0 ·
Query 1 select count(*) from ( select user_id from @nocareers where first between '2013-01-01 00:00:01' and '2014-01-01 00:00:00' group by user_id ) a query2 select year,count(*) from ( select user_id,year(min(first)) year from @nocareers group by user_id ) a group by year
0 Likes 0 ·
ASK SQL Server Central runs by you voting. For all helpful answers below, please indicate this by clicking on the thumbs up next to those answers. If any one answer lead to a solution, please indicate this by clicking on the check mark next that answer.
0 Likes 0 ·
Kev Riley avatar image
Kev Riley answered
There's a slight difference between the 2 queries. Look at this example declare @nocareers table (user_id int, [first] datetime) insert into @nocareers select 1, '11 Jan 2013' insert into @nocareers select 2, '11 Jan 2012' insert into @nocareers select 2, '11 Jan 2013' insert into @nocareers select 3, '11 Jan 2013' select count(*) from ( select user_id from @nocareers where first between '2013-01-01 00:00:01' and '2014-01-01 00:00:00' group by user_id ) a select year,count(*) from ( select user_id,year(min(first)) year from @nocareers group by user_id ) a group by year The first query gets all the user_id that exist in the date range, then counts them = 3 The second groups by the min(year), and then counts them, so the rows for user_id =2 get grouped into 2012 and not 2013
5 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Unfortunately thats not it the pattern is the same for all years The only difference is using min which is cutting down the number of records
0 Likes 0 ·
OK but without seeing the raw data it's hard to explain why. But anyway the 2 queries are not logically equivalent.
0 Likes 0 ·
After further investigation it appears that when the user_id has dates over more than one year SQLSERVER throws a wobbly and ignores the record ie: data like this fails silently user_id first year 251593 2011-11-02 15:07:19.000 2011 251593 2013-04-18 11:49:38.213 2013 260299 2012-05-28 10:53:54.000 2012 260299 2013-05-15 09:01:23.823 2013 260299 2013-05-15 09:06:47.460 2013 260299 2013-06-24 13:30:00.520 2013
0 Likes 0 ·
Well SQL Server is not really throwing a wobbly - it's the query that's at fault. The data you have shown here is exactly like the example I gave above - user_id 260299 has data over 2 years, so the count-then-filter approach is different than the filter-then-count approach.
1 Like 1 ·
And when I say 'at fault' I mean it's doing exactly as you've asked it.
0 Likes 0 ·
redtux1 avatar image
redtux1 answered
select year,count(*) from ( select user_id,year(min(first)) year from @nocareers group by user_id ) a group by year This is my take on my query first of all the subquery should be executed select user_id,year(min(first)) year from @nocareers group by user_id which to me should do the following group by user_id then get the earliest date per user_id - min(first) which is a datetime field then extract the year from this value - year(min(first)) then the outer query should query this query as a table grouping by year and counting the users.
1 comment
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Almost. The last sentence "grouping by year and counting the users" - should read "grouping by year and counting the **rows**" - the rows being the rows returned by the subquery
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.