question

balu.arunkumar avatar image
balu.arunkumar asked

Need SQL Query

Hi, I need query for below problem. As per my source data, if anyone purchase within 48 hours it should be considered as single purchase. I have 1 million unique names in my source data table. If i go with "While loop" it took so much of time. Any alternate way there?? Source Data: Create Table #Temp_data(Seq int,Name varchar(1),Purchased_date datetime) insert into #Temp_data select 1,'A','2014-10-07 13:04:16.887' insert into #Temp_data select 2,'A','2014-10-07 16:04:16.887' insert into #Temp_data select 3,'A','2014-10-08 01:04:16.887' insert into #Temp_data select 4,'A','2014-10-13 13:05:02.437' insert into #Temp_data select 5,'A','2014-10-13 20:05:02.437' insert into #Temp_data select 6,'A','2014-10-17 13:05:29.077' insert into #Temp_data select 1,'B','2014-10-06 13:05:41.830' insert into #Temp_data select 2,'B','2014-10-06 16:05:41.830' insert into #Temp_data select 3,'B','2014-10-07 05:05:41.830' insert into #Temp_data select 4,'B','2014-10-12 13:06:16.263' insert into #Temp_data select 5,'B','2014-10-12 23:06:16.263' I need output in below format. Output Query: Create Table #Temp_Out(Seq int,Name varchar(1),Purchased_date datetime,Newseq int) insert into #Temp_Out(Seq,Name,Purchased_date) select 1,'A','2014-10-07 13:04:16.887' insert into #Temp_Out(Seq,Name,Purchased_date) select 2,'A','2014-10-07 16:04:16.887' insert into #Temp_Out(Seq,Name,Purchased_date,Newseq) select 3,'A','2014-10-08 01:04:16.887',1 insert into #Temp_Out(Seq,Name,Purchased_date) select 4,'A','2014-10-13 13:05:02.437' insert into #Temp_Out(Seq,Name,Purchased_date,Newseq) select 5,'A','2014-10-13 20:05:02.437',2 insert into #Temp_Out(Seq,Name,Purchased_date,Newseq) select 6,'A','2014-10-17 05:05:29.077',3 insert into #Temp_Out(Seq,Name,Purchased_date) select 1,'B','2014-10-06 13:05:41.830' insert into #Temp_Out(Seq,Name,Purchased_date) select 2,'B','2014-10-06 16:05:41.830' insert into #Temp_Out(Seq,Name,Purchased_date,Newseq) select 3,'B','2014-10-07 05:05:41.830',1 insert into #Temp_Out(Seq,Name,Purchased_date) select 4,'B','2014-10-12 13:06:16.263' insert into #Temp_Out(Seq,Name,Purchased_date,Newseq) select 5,'B','2014-10-12 23:06:16.263',2
sql
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Dave_Green avatar image Dave_Green ♦ commented ·
Not sure I'm following how you want the output to appear - what is Newseq column representing? Anyway, I'd explore the option of joining the table to itself, based on the same name and the second instance being greater than the first but less than 48 hours greater. This gives you a list of those where there is an overlap - depending on what you want to do with that answer, you can delete / update / copy the rows as necessary).
0 Likes 0 ·
Patrick avatar image
Patrick answered
How about using a trigger
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Dave_Green avatar image Dave_Green ♦ commented ·
Why do you suggest a trigger? I believe the questioner was asking for a set of SQL to solve the problem, rather than a method - but I'd be intrigued as to why you believe a trigger would be the right thing to use, given the scenario outlined?
0 Likes 0 ·
iainrobertson avatar image
iainrobertson answered
This is quite an interesting problem. As far as I see it, you could have 1, 2, 3...n orders in any particular order group. Which makes using any sort of absolute logic (e.g. a simple offset join) impossible. Here's a solution that should give you what you need. Note that I've broken the processing into steps, this is to make it a bit easier to understand what's going on. It should (hahaha, easy for me to say) be possible to combine these individual elements into a single operation and avoid the disk writes. -- create test data if isnull(object_id('tempdb..#Orders'),0) 0 begin drop table #Orders end go create table #Orders (OrderId int, CustomerId int, OrderDate datetime) go insert #Orders select 1, 1, '1 October 2014 11:54.00' union all select 2, 1, '1 October 2014 13:54.00' union all select 3, 1, '1 October 2014 18:54.00' union all select 4, 1, '12 October 2014 11:54.00' union all select 5, 1, '15 October 2014 11:54.00' union all select 6, 1, '16 October 2014 11:54.00' go -- we expect 3 order groups: -- group 1 --1, 1, '1 October 2014 11:54.00' --2, 1, '1 October 2014 13:54.00' --3, 1, '1 October 2014 18:54.00' -- group 2 --4, 1, '1 October 2014 18:54.00' union all -- group 3 --5, 1, '15 October 2014 11:54.00' union all --6, 1, '16 October 2014 11:54.00' union all ---------------------------- -- identify group boundaries ---------------------------- if isnull(object_id('tempdb..#Orders'),0) 0 begin drop table #GroupedOrders end go -- add a row number to ensure a contiguous series of id values select RowId = row_number() over (order by CustomerId, OrderDate) , * , GroupId = cast(null as int) into #GroupedOrders from #Orders -- join on an offset of the new row id to identify the group boundaries -- temporarily use the row id to mark these -- note - if using 2012 or above, lag and lead are available -- these are probably a bit more efficient update main set GroupId = main.RowId from #GroupedOrders main left join #GroupedOrders offset on main.CustomerId = offset.CustomerId and main.RowId = offset.RowId - 1 where datediff(hh,main.OrderDate,offset.OrderDate) > 48 -- show the result select * from #GroupedOrders /* RowId OrderId CustomerId OrderDate GroupId -------------------- ----------- ----------- ----------------------- ----------- 1 1 1 2014-10-01 11:54:00.000 NULL 2 2 1 2014-10-01 13:54:00.000 NULL 3 3 1 2014-10-01 18:54:00.000 3 4 4 1 2014-10-12 11:54:00.000 4 5 5 1 2014-10-15 11:54:00.000 NULL 6 6 1 2014-10-16 11:54:00.000 NULL (6 row(s) affected) */ -- note that we have identifed the start point for each group -- with the exception of the very first order, as a result of having no -1 record to compare -- we can safely update this directly, as it is by default a group start point update #GroupedOrders set GroupId = 1 where RowId = 1 ----------------------------------------------------------- -- use the group boundary to populate the rest of the group ----------------------------------------------------------- -- split the data into two populations, those with a boundary marker and those without -- use an aggregate with an inequality join to push the correct boundary marker to those records without one -- note that this is what is known as a "triangular join" - it might not be very efficient ; with cte as ( select RowId, GroupId from #GroupedOrders where GroupId is null ) , cte2 as ( select cte.* , mGroupId = max(der.GroupId) from cte join ( select RowId, GroupId from #GroupedOrders where GroupId is not null ) der on cte.RowId > der.GroupId group by cte.RowId, cte.GroupId ) update #GroupedOrders set GroupId = mGroupId from #GroupedOrders main join cte2 on main.RowId = cte2.RowId -- show the result select * from #GroupedOrders /* RowId OrderId CustomerId OrderDate GroupId -------------------- ----------- ----------- ----------------------- ----------- 1 1 1 2014-10-01 11:54:00.000 1 2 2 1 2014-10-01 13:54:00.000 1 3 3 1 2014-10-01 18:54:00.000 3 4 4 1 2014-10-12 11:54:00.000 4 5 5 1 2014-10-15 11:54:00.000 4 6 6 1 2014-10-16 11:54:00.000 4 (6 row(s) affected) */ -- now just renumber the GroupIds to provide a contiguous series ; with cte as ( select GroupId , newGroupId = dense_rank() over (order by GroupId) from #GroupedOrders ) update cte set GroupId = newGroupId -- show the result select * from #GroupedOrders
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.