x

Split dataset into group based on date+count [MSSQL 2012]

Hi all. This question was originally posted to StackOverflow but thought I would post here as the knowledge is much more concentrated :)

I have a dataset within SQL Server 2012 that needs to work on the following principle:

A "seat" allows for 8 results in a 6 month period from the first result. If there are more than 8 results within 6 months from the date of the first result then a new "seat" is given.

If a result is created over 6 months from the date of the first result then a new "seat" is given

So I have the following data:

 User    DateCreated
 ----    -------------
 User1    2015-01-01 16:05:00
 User1    2015-01-02 16:05:00
 User1    2015-01-03 16:05:00
 User1    2015-01-04 16:05:00
 User1    2015-01-05 16:05:00
 User1    2015-01-06 16:05:00
 User1    2015-01-07 16:05:00
 User1    2015-01-08 16:05:00
 User1    2015-01-09 16:05:00
 User1    2015-01-10 13:25:00
 User1    2015-01-11 13:25:00
 User1    2015-01-12 13:25:00
 User1   2015-09-01 13:00:00 
 User1   2016-04-01 13:00:00
 User2    2015-01-01 13:25:00    
 User2    2015-01-02 13:25:00    
 User2    2015-09-01 13:25:00    
 User2    2016-01-01 13:25:00    
 User2    2016-05-01 13:25:00    
 User3    2015-01-01 16:05:00    
 User3    2015-01-02 16:05:00        
 User3    2015-01-03 16:05:00        

Based on the above rules, they could be split out into the following "groups"

 User    DateCreated                Group
 ----    -------------            -----
 User1    2015-01-01 16:05:00        1
 User1    2015-01-02 16:05:00        1
 User1    2015-01-03 16:05:00        1
 User1    2015-01-04 16:05:00        1
 User1    2015-01-05 16:05:00        1
 User1    2015-01-06 16:05:00        1
 User1    2015-01-07 16:05:00        1
 User1    2015-01-08 16:05:00        1 /*8 results within 6 months from first row*/

 User1    2015-01-09 16:05:00        2
 User1    2015-01-10 13:25:00        2
 User1    2015-01-11 13:25:00        2
 User1    2015-01-12 13:25:00        2

 User1   2015-09-01 13:00:00     3 /*created 6+ months after previous row*/

 User1   2016-04-01 13:00:00     4 /*created 6+ months after previous row*/
 ----
 User2    2015-01-01 13:25:00        1
 User2    2015-01-02 13:25:00        1

 User2    2015-09-01 13:25:00        2 /*created 6+ months after previous row*/
 User2    2016-01-01 13:25:00        2 

 User2    2016-05-01 13:25:00        3 /*created 6+ months after previous row*/
 ----
 User3    2015-01-01 16:05:00        1
 User3    2015-01-02 16:05:00        1
 User3    2015-01-03 16:05:00        1 /*3 results within 6 months from first row, within the 8 result cut-off */

Which could then end up as

 User        Seats
 ----        -----
 User1        4
 User2        3
 User3       1

How would this be achieved within a SQL query, if at all?

The comment http://stackoverflow.com/a/40385655/1283391 is halfway there.

I don't think I need a SUM, more a running total.

more ▼

asked Jan 18 at 04:00 PM in Default

avatar image

chandler3224
0

Just for clarification - what is 6 months? is 1st Sept 6 months after 2nd March? Datetdiff(month....) would say so, but I don't think that's what you want?

When does the 6 months start? In your example you say "6 months from the date of the first result", but then in the results say "created 6+ months after previous row" - so for example where would a row

 User1   2016-04-01

go? It's clearly more than 6 months since the start date of group 2 (1st Sept 2015), but not more than 6 months than the previous row (1st Dec 2015)

Jan 19 at 10:11 AM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

OK, so I'm assuming that the date calculations for every six months are based on each user, so the earliest date per user is where the calculation for the 6 months begins. I've knocked a query together that can definitely be refined down but should be a good starting block.

after putting the data above into a table called UserGroup I put a row_number() against the initial data in a cte table

 with initialrows as
 (select 
 UserName
 ,DateCreated
 ,row_number() over(partition by UserName order by DateCreated) RowNum
 from
 UserGroup)

This allows you to join the data set to itself using the row number minus 1 to join the next date to the previous date and perform a datediff calculation between that rows date and the next. If you couple this with a cumulative sum, partitioned by user and order by date, you can then get a span of time across the rows between the date created rows. I then divided the sum of the cumulative datediff sum and divided by 180 (roughly six months in days.....ish) and wrapped this in a FLOOR. This then allocates the 'seat' number to the users, anything 0-180 gets 0 anything 181-360 gets 1, and on and on. I put all this in to a second cte feeding off the first:

 , t2 as(
 select 
 fr.UserName
 ,fr.DateCreated
 ,isnull(datediff(dd,sr.datecreated,fr.datecreated),0) DateDiffOne
 ,sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) DateDiffCum
 ,floor(sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) / 180) DateDiffCumFloor
 from
 initialrows fr
 left join initialrows sr on fr.username = sr.username and fr.RowNum -1 = sr.RowNum 
 )

The datediffone and datediffcum calculations above are just to show the stages worked through to get to the datediffcumfloor calculation. You can then just do a count distinct on the DateDiffCumFloor grouped by username

 select
 t2.Username
 ,count(distinct datediffcumfloor) Seat
 from
 t2
 group by
 t2.username

and it will give you the number of seats per user. The query end result is for the provided result set is:

 Username Seat
 User1        3
 User2        2
 User3        1

As I said, it can definitely be refined in to something much neater, but should provide a solid start. Full query below:

 with initialrows as
 (select 
 UserName
 ,DateCreated
 ,row_number() over(partition by UserName order by DateCreated) RowNum
 from
 UserGroup)
 , t2 as(
 select 
 fr.UserName
 ,fr.DateCreated
 ,isnull(datediff(dd,sr.datecreated,fr.datecreated),0) DateDiffOne
 ,sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) DateDiffCum
 ,floor(sum(isnull(datediff(dd,sr.datecreated,fr.datecreated),0)) over(partition by fr.username order by fr.username,fr.datecreated rows unbounded preceding) / 180) DateDiffCumFloor
 from
 initialrows fr
 left join initialrows sr on fr.username = sr.username and fr.RowNum -1 = sr.RowNum 
 )
 select
 t2.Username
 ,count(distinct datediffcumfloor) Seat
 from
 t2
 group by
 t2.username
more ▼

answered Jan 24 at 09:12 PM

avatar image

BigFatBeard
11 2

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x87
x64
x28

asked: Jan 18 at 04:00 PM

Seen: 51 times

Last Updated: Jan 24 at 09:16 PM

Copyright 2017 Redgate Software. Privacy Policy