Background: I want to find the retention of a customer cohort on a weekly-rolling basis, so that they can take necessary intervention if a rider has not taken a ride for 28 days.
A problem in detail of what I'm trying to acheive sql.pdf
Below is what I tired so far :
create table Tripdata ( [date] date, rider_id int, trip_id int, city_id int, status varchar(100) ) go **Query to insert values** INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed') GO INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed') GO
Below are list of metrics I'm trying to achieve in a single query
Definitions of Columns in the Output Query (Single Query):
- date: the date in respect to which the below metric will be computed.
- city_id: the id of the city
- dau: Count of distinct riders who have completed min one trip on that date.
- wau: Count of distinct riders who have completed min one trip in the last 7 days with respect to the date in column ‘date’
- new_rider: count of distinct riders who have taken their
first trip
in the last 7 days with respect to the date in column ‘date’ - previous_mau: count of distinct riders who have taken min one trip between the last 56 to last 29 days with respect to the date in column ‘date’
- retained: intersection of distinct riders in the previous_mau and mau_28 phase9. resurrect: count of distinct riders who were inactive in the previous_mau phase but wereactive in the mau_28 phase.10. churn: count of riders who were active in the previous_mau phase but inactive in themau_28 phase
Active : If a rider has completed at least one trip in the respective period Inactive : If a rider has not taken a single trip in the respective period
Below is the query I achieved so far :
select[date], city_id ,(select count(distinct[rider_id])from#Tripdata b where b.[date]= a.[date]and a.city_id = b.city_id)as[dau],(select count(distinct[rider_id])from#Tripdata b where b.[date]between dateadd(day,-7, a.[date])and a.[date]and a.city_id = b.city_id)as[wau],(select count(distinct[rider_id])from#Tripdata b where a.city_id = b.city_id and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]< dateadd(day,-7, a.[date]))and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]> a.[date]))as[new_rider],(select count(distinct[rider_id])from#Tripdata b where a.city_id = b.city_id and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]< dateadd(day,-56, a.[date]))and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]> dateadd(day,-29, a.[date])))as[previous_mau]from#Tripdata a groupby[date], city_id
How to acheive 7 point in my query and also avoid using joins and In statement to speed up the query?