This should be easy but I am running into all types of trouble trying to figure this one out. I need to be able to select records from a table based on the min(date) field. Lets say I have the following fields Store ID, Person_id, Source_type_id, Source_Type,Source_created_at,Source_start_date.
Please look for the attachment 1 for data set
Same store I’d and same person I’d but different scenarios. Here I need all the records which source type = 'Closed' but their source start date should be start date of that respective episode.
Note: Here we cant take the minimum of source type id because they are not assigned in orderly manner.
For example first episode source type = 'Closed' on 4/3/2008 but the episode started on 12/18/2007. In similar way I need to get all the closed source type records where the source start date should be their respective episode star date.
The final output from this example should be four records with source type = 'closed' with start date of their respective episode start date
Please look for the attachment 2 for final result
This is not a very good table design. If you need to be able to group the records for the same store/person combinations by episode then, well, you kinda need a column to exist which will have the same number for every record which belongs to the same episode. Without such column, the attempts to figure out when one episode ended and another one started are based on the silly assumption that one episode must close before the next one is initiated. This is a very dangerous assumption which will bite at some point.
That said, it is still possible to get the results as requested in question. The idea of the script below is as follows: the records with source type = closed are selected along with the datetime value from the previous closed episode for the same store/person combination. The cross apply is then called for each such record in order to determine the earliest date of the records sandwiched between 2 closed records. The script will work in SQL Server 2012 or higher. Here is the script (just change the table name to what it needs to be, it will produce desired results then):
Edit: added the statement which works in earlier versions of SQL Server (2005, 2008, and 2008 R2)
The versions of SQL Server earlier than 2012 do not support analytical functions such as LEAD and LAG, and therefore, some other technique needs to be used in order to include the values from the previous row while reading the current row. There are few ways of doing it, but what I do like is the join of the table with itself on the off-by-one basis. The concept is very simple. The rows are ordered within each partition, for example, in this case the closed records for each store/person combination are sorted by date and numbered. The numbering restarts when the store/person combination changes. Then the table is joined with itself one off-by-one basis, so that record with first closed episode for specific store/person does not have a previous counterpart, while all other closed episodes do. This technique allows a rudimentary, cave man support of the analytical functions which the versions earlier than 2012 lack. It is worth noting again that none of this nonsense would be necessary if the table were not designed so horribly. If there is a need to query the data considering grouping of the records by episode then the respective column must exist, period. Because the column is missing, the artificially complex logic must be used in order to get the desired results. It is like buying a train ticket and then running behind the train instead of boarding it to ride. Anyway, here is the script which will work in earlier versions (once the correct table name is plugged in):
Edit: modified the statement to accommodate additional scenario
If only those closed records need to be included which have the episodes with source_type_id in (2, 3) then there is a need to add additional predicate to the statement above. The predicate is needed to check whether there are any records with source = 2 or 3 sandwiched between 2 closed records. While it would be very lovely to be able to just change the condition inside of the cross apply from Source_type_id < 7 to Source_type_id in (2, 3), this is not possible because such a condition is not enough. For example, if there is an episode consisting of sources 1, 2 and 7 then we cannot really exclude the source = 1 because this is where the final result date comes from. Including source 1 into the mix is incorrect as well because this will include the episodes with, say sources 1, 4, and 7 but these are not needed to be included at all. It is with heavy heart and utter disappointment I include the correlated subquery into the mix to add even more complexity to the already needlessly complex logic caused by the poor design choice. Addition of the correlated subquery does the trick though. While it is possible to add it either inside of the cross apply or to the outside, I opted to add it to the outside so that the cross apply remains the same and will be called not as many times. It would be much better to simply add the episode ID column to the table. Here is the updated script:
Hope this helps.