question

mahevarma7 avatar image
mahevarma7 asked

Optimizing the SQL query

Background: I want to find the retention of a customer cohort on a weekly-rolling basis, so that they can take necessary intervention if a rider has not taken a ride for 28 days.

A problem in detail of what I'm trying to acheive sql.pdf

Below is what I tired so far :

 create table Tripdata
    (
      [date] date,
      rider_id int,
      trip_id int,
      city_id int,
      status varchar(100)
    )
    go

**Query  to insert values** 


    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
    GO
    INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
    GO

Below are list of metrics I'm trying to achieve in a single query

Definitions of Columns in the Output Query (Single Query):

  1. date: the date in respect to which the below metric will be computed.
  2. city_id: the id of the city
  3. dau: Count of distinct riders who have completed min one trip on that date.
  4. wau: Count of distinct riders who have completed min one trip in the last 7 days with respect to the date in column ‘date’
  5. new_rider: count of distinct riders who have taken their first trip in the last 7 days with respect to the date in column ‘date’
  6. previous_mau: count of distinct riders who have taken min one trip between the last 56 to last 29 days with respect to the date in column ‘date’
  7. retained: intersection of distinct riders in the previous_mau and mau_28 phase9. resurrect: count of distinct riders who were inactive in the previous_mau phase but wereactive in the mau_28 phase.10. churn: count of riders who were active in the previous_mau phase but inactive in themau_28 phase

Active : If a rider has completed at least one trip in the respective period Inactive : If a rider has not taken a single trip in the respective period

Below is the query I achieved so far :

select[date], city_id
,(select count(distinct[rider_id])from#Tripdata b where b.[date]= a.[date]and a.city_id = b.city_id)as[dau],(select count(distinct[rider_id])from#Tripdata b where b.[date]between dateadd(day,-7, a.[date])and a.[date]and a.city_id = b.city_id)as[wau],(select count(distinct[rider_id])from#Tripdata b where a.city_id = b.city_id 
    and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]< dateadd(day,-7, a.[date]))and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]> a.[date]))as[new_rider],(select count(distinct[rider_id])from#Tripdata b where a.city_id = b.city_id 
    and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]< dateadd(day,-56, a.[date]))and b.[rider_id]NOTIN(Select c.[rider_id]from#Tripdata c where c.[date]> dateadd(day,-29, a.[date])))as[previous_mau]from#Tripdata a
groupby[date], city_id

How to acheive 7 point in my query and also avoid using joins and In statement to speed up the query?

sql queryhomeworkoptimizationquery hints
sql.pdf (119.5 KiB)
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·

Based on the attached pdf, this looks like homework. There's no problem with asking for help with homework assignments. But it's good if you mention that it's homework when you ask the question. So I have added that as a tag to your question. If it is indeed not homework, then please just remove the tag. I'll try to find time during the day to get into the details of your assignment and provide some hints to it.

0 Likes 0 ·

0 Answers

·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.