question

sathwikreddy987 avatar image
sathwikreddy987 asked

SQL Query

This should be easy but I am running into all types of trouble trying to figure this one out. I need to be able to select records from a table based on the min(date) field. Lets say I have the following fields Store ID, Person_id, Source_type_id, Source_Type,Source_created_at,Source_start_date. Please look for the attachment 1 for data set Same store I’d and same person I’d but different scenarios. Here I need all the records which source type = 'Closed' but their source start date should be start date of that respective episode. Note: Here we cant take the minimum of source type id because they are not assigned in orderly manner. For example first episode source type = 'Closed' on 4/3/2008 but the episode started on 12/18/2007. In similar way I need to get all the closed source type records where the source start date should be their respective episode star date. The final output from this example should be four records with source type = 'closed' with start date of their respective episode start date Please look for the attachment 2 for final result [1]: /storage/temp/4413-final-data-set.png
sql query
data-set.png (20.7 KiB)
final-data-set.png (10.4 KiB)
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered
This is not a very good table design. If you need to be able to group the records for the same store/person combinations by episode then, well, you kinda need a column to exist which will have the same number for every record which belongs to the same episode. Without such column, the attempts to figure out when one episode ended and another one started are based on the silly assumption that one episode must close before the next one is initiated. This is a very dangerous assumption which will bite at some point. That said, it is still possible to get the results as requested in question. The idea of the script below is as follows: the records with source type = closed are selected along with the datetime value from the previous closed episode for the same store/person combination. The cross apply is then called for each such record in order to determine the earliest date of the records sandwiched between 2 closed records. The script will work in SQL Server 2012 or higher. Here is the script (just change the table name to what it needs to be, it will produce desired results then): ;with closed as ( select *, lag(Source_created_at, 1, 0) over( partition by [Store ID], Person_id order by Source_created_at) Prev from #t where Source_type_id = 7 ) select c.[Store ID], c.Person_id, c.Source_type_id, c.Source_Type, c.Source_created_at, c.Source_start_date, starts.[Final Result date] from closed c cross apply ( select min(Source_start_date) [Final Result date] from #t where [Store ID] = c.[Store ID] and Person_id = c.Person_id and Source_type_id < 7 and Source_created_at between c.Prev and c.Source_created_at ) starts; **Edit: added the statement which works in earlier versions of SQL Server (2005, 2008, and 2008 R2)** The versions of SQL Server earlier than 2012 do not support analytical functions such as [LEAD][1] and [LAG][2], and therefore, some other technique needs to be used in order to include the values from the previous row while reading the current row. There are few ways of doing it, but what I do like is the join of the table with itself on the off-by-one basis. The concept is very simple. The rows are ordered within each partition, for example, in this case the closed records for each store/person combination are sorted by date and numbered. The numbering restarts when the store/person combination changes. Then the table is joined with itself one off-by-one basis, so that record with first closed episode for specific store/person does not have a previous counterpart, while all other closed episodes do. This technique allows a rudimentary, cave man support of the analytical functions which the versions earlier than 2012 lack. It is worth noting again that none of this nonsense would be necessary if the table were not designed so horribly. If there is a need to query the data considering grouping of the records by episode then the respective column must exist, period. Because the column is missing, the artificially complex logic must be used in order to get the desired results. It is like buying a train ticket and then running behind the train instead of boarding it to ride. Anyway, here is the script which will work in earlier versions (once the correct table name is plugged in): ;with closed as ( select *, row_number() over( partition by [Store ID], Person_id order by Source_created_at) N from #t where Source_type_id = 7 ) select c.[Store ID], c.Person_id, c.Source_type_id, c.Source_Type, c.Source_created_at, c.Source_start_date, starts.[Final Result date] from closed c left join closed p on c.[Store ID] = p.[Store ID] and c.Person_id = p.Person_id and c.N = p.N + 1 cross apply ( select min(Source_start_date) [Final Result date] from #t where [Store ID] = c.[Store ID] and Person_id = c.Person_id and Source_type_id < 7 and Source_created_at between isnull(p.Source_created_at, '19000101') and c.Source_created_at ) starts; **Edit: modified the statement to accommodate additional scenario** If only those closed records need to be included which have the episodes with source\_type\_id in (2, 3) then there is a need to add additional predicate to the statement above. The predicate is needed to check whether there are any records with source = 2 or 3 sandwiched between 2 closed records. While it would be very lovely to be able to just change the condition inside of the cross apply from **Source_type_id < 7** to **Source_type_id in (2, 3)**, this is not possible because such a condition is not enough. For example, if there is an episode consisting of sources 1, 2 and 7 then we cannot really exclude the source = 1 because this is where the final result date comes from. Including source 1 into the mix is incorrect as well because this will include the episodes with, say sources 1, 4, and 7 but these are not needed to be included at all. It is with heavy heart and utter disappointment I include the correlated subquery into the mix to add even more complexity to the already needlessly complex logic caused by the poor design choice. Addition of the correlated subquery does the trick though. While it is possible to add it either inside of the cross apply or to the outside, I opted to add it to the outside so that the cross apply remains the same and will be called not as many times. It would be much better to simply add the episode ID column to the table. Here is the updated script: ;with closed as ( select *, row_number() over( partition by [Store ID], Person_id order by Source_created_at) N from #t where Source_type_id = 7 ) select c.[Store ID], c.Person_id, c.Source_type_id, c.Source_Type, c.Source_created_at, c.Source_start_date, starts.[Final Result date] from closed c left join closed p on c.[Store ID] = p.[Store ID] and c.Person_id = p.Person_id and c.N = p.N + 1 cross apply ( select min(Source_start_date) [Final Result date] from #t where [Store ID] = c.[Store ID] and Person_id = c.Person_id and Source_type_id < 7 and Source_created_at between isnull(p.Source_created_at, '19000101') and c.Source_created_at ) starts where exists ( select 1 from #t where [Store ID] = c.[Store ID] and Person_id = c.Person_id and Source_type_id in (2, 3) and Source_created_at between isnull(p.Source_created_at, '19000101') and c.Source_created_at ); Hope this helps. Oleg [1]: https://docs.microsoft.com/en-us/sql/t-sql/functions/lead-transact-sql [2]: https://docs.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@sathwikreddy987 Yes, of course. I added the query to my answer which will work in earlier versions of SQL Server. The query should work as expected, but please let me know if anything else needs to be changed. Thank you.
1 Like 1 ·
sathwikreddy987 avatar image sathwikreddy987 commented ·
Thank you so much. I'm using the SSMS 2208R2 and this query doesn't work. Could you please assist? Thanks in advance!!
0 Likes 0 ·
sathwikreddy987 avatar image sathwikreddy987 commented ·
Thank you so much for your help. This query worked 9 out of 10 scenarios. Here is the example where this query needs to be modified. I need to pull the only the closed records where in the entire episode there is Source_type_id (2,3). For example: for same store_id and person_id: Source_type_id: 1 3 7 ---->Episode closed---- step1 1 9 7 ---> Episode closed----- step2 7----->Episode closed without any start date record..Data entry issue and needs to eliminated-----step3 1 2 7--->Episode closed -----step4 Only step 1 and 4 records needs to be in final output because in step 1 whole episode their is Source_type_id =3 before the Source_type_id= 7 record in that respective episode.............Similarly in step 4 whole episode their is Source_type_id =2 before the Source_type_id= 7 record in that respective episode. Just an FYI - If we have both Source_type_id =3 or Source_type_id = 2 in same episode then we need to consider this scenario as both happened in same episode and start-date will date of Source_type_id: 1 in that respective episode. Could you please assist? Thanks in advance!!
0 Likes 0 ·
sathwikreddy987 avatar image sathwikreddy987 commented ·
@Oleg Could you please assist with the query that I mentioned in above post? Thanks in advance!!
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@sathwikreddy987 Please check the last query in my answer, it should accommodate the scenario you described and let me know whether it does or not. In a mean time, please consider adding the missing column to the table. Thank you.
0 Likes 0 ·
sathwikreddy987 avatar image
sathwikreddy987 answered
@Oleg. Awesome!! Query worked. Thank you so much for your help!
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.