question

chandler3224 avatar image
chandler3224 asked

Split dataset into group based on date+count [MSSQL 2012]

Hi all. This question was originally posted to [StackOverflow][1] but thought I would post here as the knowledge is much more concentrated :) I have a dataset within SQL Server 2012 that needs to work on the following principle: > A "seat" allows for 8 results in a 6 month period from the first result. If there are more than 8 results within 6 months from the date of the first result then a new "seat" is given. >If a result is created over 6 months from the date of the first result then a new "seat" is given So I have the following data: User DateCreated ---- ------------- User1 2015-01-01 16:05:00 User1 2015-01-02 16:05:00 User1 2015-01-03 16:05:00 User1 2015-01-04 16:05:00 User1 2015-01-05 16:05:00 User1 2015-01-06 16:05:00 User1 2015-01-07 16:05:00 User1 2015-01-08 16:05:00 User1 2015-01-09 16:05:00 User1 2015-01-10 13:25:00 User1 2015-01-11 13:25:00 User1 2015-01-12 13:25:00 User1 2015-09-01 13:00:00 User1 2016-04-01 13:00:00 User2 2015-01-01 13:25:00 User2 2015-01-02 13:25:00 User2 2015-09-01 13:25:00 User2 2016-01-01 13:25:00 User2 2016-05-01 13:25:00 User3 2015-01-01 16:05:00 User3 2015-01-02 16:05:00 User3 2015-01-03 16:05:00 Based on the above rules, they could be split out into the following "groups" User DateCreated Group ---- ------------- ----- User1 2015-01-01 16:05:00 1 User1 2015-01-02 16:05:00 1 User1 2015-01-03 16:05:00 1 User1 2015-01-04 16:05:00 1 User1 2015-01-05 16:05:00 1 User1 2015-01-06 16:05:00 1 User1 2015-01-07 16:05:00 1 User1 2015-01-08 16:05:00 1 /*8 results within 6 months from first row*/ User1 2015-01-09 16:05:00 2 User1 2015-01-10 13:25:00 2 User1 2015-01-11 13:25:00 2 User1 2015-01-12 13:25:00 2 User1 2015-09-01 13:00:00 3 /*created 6+ months after previous row*/ User1 2016-04-01 13:00:00 4 /*created 6+ months after previous row*/ ---- User2 2015-01-01 13:25:00 1 User2 2015-01-02 13:25:00 1 User2 2015-09-01 13:25:00 2 /*created 6+ months after previous row*/ User2 2016-01-01 13:25:00 2 User2 2016-05-01 13:25:00 3 /*created 6+ months after previous row*/ ---- User3 2015-01-01 16:05:00 1 User3 2015-01-02 16:05:00 1 User3 2015-01-03 16:05:00 1 /*3 results within 6 months from first row, within the 8 result cut-off */ Which could then end up as User Seats ---- ----- User1 4 User2 3 User3 1 How would this be achieved within a SQL query, if at all? The comment http://stackoverflow.com/a/40385655/1283391 is halfway there. I don't think I need a **SUM**, more a **running total**. [1]: http://stackoverflow.com/questions/40384974/split-dataset-into-group-based-on-datecount-mssql-2012
datetimepartitioningsql2012
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
Just for clarification - what is 6 months? is 1st Sept 6 months after 2nd March? Datetdiff(month....) would say so, but I don't think that's what you want? When does the 6 months start? In your example you say "6 months from the date of the first result", but then in the results say "created 6+ months after previous row" - so for example where would a row User1 2016-04-01 go? It's clearly more than 6 months since the start date of group 2 (1st Sept 2015), but not more than 6 months than the previous row (1st Dec 2015)
0 Likes 0 ·

1 Answer

·
BigFatBeard avatar image
BigFatBeard answered
OK, so I'm assuming that the date calculations for every six months are based on each user, so the earliest date per user is where the calculation for the 6 months begins. I've knocked a query together that can definitely be refined down but should be a good starting block. after putting the data above into a table called UserGroup I put a row_number() against the initial data in a cte table with initialrows as (select UserName ,DateCreated ,row_number() over(partition by UserName order by DateCreated) RowNum from UserGroup) This allows you to join the data set to itself using the row number minus 1 to join the next date to the previous date and perform a datediff calculation between that rows date and the next. If you couple this with a cumulative sum, partitioned by user and order by date, you can then get a span of time across the rows between the date created rows. I then divided the sum of the cumulative datediff sum and divided by 180 (roughly six months in days.....ish) and wrapped this in a FLOOR. This then allocates the 'seat' number to the users, anything 0-180 gets 0 anything 181-360 gets 1, and on and on. I put all this in to a second cte feeding off the first: , t2 as( select fr.UserName ,fr.DateCreated ,isnull(datediff(dd,sr.datecreated,fr.datecreated),0) DateDiffOne ,sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) DateDiffCum ,floor(sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) / 180) DateDiffCumFloor from initialrows fr left join initialrows sr on fr.username = sr.username and fr.RowNum -1 = sr.RowNum ) The datediffone and datediffcum calculations above are just to show the stages worked through to get to the datediffcumfloor calculation. You can then just do a count distinct on the DateDiffCumFloor grouped by username select t2.Username ,count(distinct datediffcumfloor) Seat from t2 group by t2.username and it will give you the number of seats per user. The query end result is for the provided result set is: Username Seat User1 3 User2 2 User3 1 As I said, it can definitely be refined in to something much neater, but should provide a solid start. Full query below: with initialrows as (select UserName ,DateCreated ,row_number() over(partition by UserName order by DateCreated) RowNum from UserGroup) , t2 as( select fr.UserName ,fr.DateCreated ,isnull(datediff(dd,sr.datecreated,fr.datecreated),0) DateDiffOne ,sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) DateDiffCum ,floor(sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) / 180) DateDiffCumFloor from initialrows fr left join initialrows sr on fr.username = sr.username and fr.RowNum -1 = sr.RowNum ) select t2.Username ,count(distinct datediffcumfloor) Seat from t2 group by t2.username
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.