find ranges in groups which are identified by a flag


I have the following table:

 declare  @table table (dates int, is_missing tinyint, group_id numeric(18)) 
 insert into @table(dates, is_missing, group_id) 
 select 20110719,0,1 
 union all 
 select 20110720,0,1 
 union all 
 select 20110721,0,1 
 union all 
 select 20110722,1,1 
 union all 
 select 20110723,0,1 
 union all 
 select 20110724,0,1 
 union all 
 select 20110725,0,1 
 union all 
 select 20110726,1,1 
 union all 
 select 20110727,0,1 
 union all 
 select 20110728,1,1 
 union all 
 select 20110728,1,3 
 union all 
 select 20110729,1,3 
 union all 
 select 20110730,0,3 
 union all 
 select 20110731,1,3 
 union all 
 select 20110801,0,3   
 union all 
 select 20110802,0,3 
 union all 
 select 20110803,0,3
 select * 
 from @table 
 order by group_id, dates

What I am trying to do is to return ranges of dates for each group which are identified by the missing day flag. To make this more clear the results of the query will have to look like this:

 group_id   start_date  end_date  days_count
 1          20110719    20110721  3
 1          20110723    20110725  3
 1          20110727    20110727  1
 3          20110730    20110730  1
 3          20110801    20110803  3

The is_missing flag basicaly separates the ranges per group. It actually says that a date is missing and therefore all the other dates located between is_missing flags are the groups I am trying to find their start and end dates as well as their days numbers count.

Is there a simple way to do this?

Thanks a lot.

more ▼

asked May 29, 2011 at 05:03 AM in Default

avatar image

43 2 2 5

(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

You can go about it using windowing functions, using the idea of islands and gaps.

You identify the groups of consecutive dates inside of a group_id, then find the minimum, maximum and count for this "pseudo-group".

Translated into T-SQL:

 WITH    dataset
           AS (SELECT    *,
                         rn = ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY dates)
               FROM      @table
           AS (SELECT    dates,
                         grp = group_id + (rn - ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY dates)) -- identify the consecutive dates inside a group_id
               FROM      dataset
               WHERE     is_missing = 0),
           AS (SELECT    dates,
                         rn = ROW_NUMBER() OVER (PARTITION BY grp ORDER BY dates) -- create a new grouping id
               FROM      datasetext)
 /* display the results */
     SELECT  group_id,
             MIN(dates) as start_date,
             MAX(dates) as end_date,
             MAX(rn) as days_count
     FROM    grouper
     GROUP BY grp,

That should do what you want.

more ▼

answered May 29, 2011 at 07:45 AM

avatar image

26.2k 18 37 48

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: May 29, 2011 at 05:03 AM

Seen: 2242 times

Last Updated: May 29, 2011 at 05:08 AM

Copyright 2018 Redgate Software. Privacy Policy