- Home /

Afternoon All, Help me check my thinking on this one. And please pardon then lack CREATE TABLES…I’ll post those when I get back to the office. I have a table of daily flight activity for airplanes across the country, and I need to figure out for the preceding 30 days how many flights each airplane has taken. Obviously for some airplanes the answer on any given day is zero flights occurred. First, I setup a calendar / tally table to hold my dates. I only have 6 dates in my calendar table just to facilitate the R&D aspects… And started out with a LEFT OUTER JOIN SELECT CONVERT(DATE, z.date_range) flight_date , t.n_number , COUNT(t.n_number) flights FROM z_date_stub z LEFT OUTER JOIN t_flight t ON CONVERT(DATE, z.date_range) = CONVERT(DATE, t.departure_date) AND t.n_number IN ( 'N983JC' ) GROUP BY t.n_number , CONVERT(DATE, z.date_range) ORDER BY CONVERT(DATE, z.date_range) Note – “AND t.n_number IN ('N983JC'))” was just to constrain the dataset while I work the kinks out. Which on its face performed well and returned a result set that was close. flight_date n_number flights 2014-02-26 N983JC 9 2014-02-27 N983JC 9 2014-02-28 N983JC 2 2014-03-01 NULL 0 2014-03-02 NULL 0 2014-03-03 NULL 0 The only problem is I really need that n_number column to hold the n_number and not NULL. (Think about it…if I released the constraint to all n_numbers any airplanes without flights on a particular day would group into the NULL column) So then I just started brainstormin, and pondered OUTER APPLY. Initially ran into the same problem of the NULL results, and settled on two OUTER APPLYS. SELECT CONVERT(DATE, z.date_range) flight_date , x.n_number , COUNT(CONVERT(DATE, y.departure_date)) flights FROM z_date_stub z OUTER APPLY ( SELECT DISTINCT t.n_number FROM t_flight t WHERE t.n_number IN ( 'N983JC' ) ) x OUTER APPLY ( SELECT t.departure_date , t.n_number FROM t_flight t WHERE CONVERT(DATE, z.date_range) = CONVERT(DATE, t.departure_date) AND t.n_number = x.n_number ) y GROUP BY z.date_range , x.n_number , y.n_number ORDER BY x.n_number , z.date_range And BAM, this is the result set: flight_date n_number flights 2014-02-26 N983JC 9 2014-02-27 N983JC 9 2014-02-28 N983JC 2 2014-03-01 N983JC 0 2014-03-02 N983JC 0 2014-03-03 N983JC 0 It’s the data I’m expecting, and performs well for the entire dataset (68,000 planes / 103,000 flights / 1.9M rows in the result set / 22 seconds), but it’s got me wondering if this is the best way to write this. Thoughts? Alternatives? Thanks, Steve

Comment

rsheik

Grant Fritchey

Grant Fritchey

when you got a chance to post the CREATE TABLE ... please also post some sample data and expected result

This site runs off of votes. Please mark all helpful answers by clicking on the thumbs up next to those answers. If any one answer lead to a solution, please indicate that by clicking on the check mark next to that answer.

**Answer** by iainrobertson
·
Mar 14, 2014 at 02:29 PM

If you cross join your date range of interest with your list of planes you can then outer apply the counts just once:
-- create test data
create table #Planes (PlaneId varchar(7))
insert #Planes values ('N21221')
insert #Planes values ('N21231')
insert #Planes values ('N24421')
insert #Planes values ('A27821')
insert #Planes values ('N21266')
create table #Flights (PlaneId varchar(7), DateId int)
insert #Flights values ('N21221',1)
insert #Flights values ('N21231',1)
insert #Flights values ('N24421',1)
insert #Flights values ('A27821',2)
insert #Flights values ('N21221',3)
insert #Flights values ('N21231',1)
insert #Flights values ('N24421',3)
insert #Flights values ('A27821',3)
insert #Flights values ('N21266',1)
insert #Flights values ('N21221',2)
insert #Flights values ('N21231',2)
insert #Flights values ('N24421',1)
insert #Flights values ('A27821',3)
insert #Flights values ('N21266',3)
create table #Calendar (DateId int, Date datetime)
insert #Calendar values (1,'1 July 2013')
insert #Calendar values (2,'2 July 2013')
insert #Calendar values (3,'3 July 2013')
-- get the data
; with cte as (
select PlaneId, DateId, Date
from #Planes
cross join #Calendar
)
select cte.PlaneId, cte.Date, isnull(fc.FlightCount,0)
from cte
outer apply (
select PlaneId, DateId, FlightCount = count(*)
from #Flights
where PlaneId = cte.PlaneId
and DateId = cte.DateId
group by PlaneId, DateId
) fc
order by PlaneId, cte.DateId

**Answer** by Grant Fritchey
·
Mar 14, 2014 at 09:43 AM

I'm not sure about the logic, but a few things stand out from a performance stand point. First, this query: SELECT DISTINCT t.n_number FROM t_flight t WHERE t.n_number IN ( 'N983JC' ) DISTINCT is an aggregate function that's going to use up a lot of resources aggregating values. I get that you may have a list of values instead of just one, but I'd look to using a [tally table][1] to pivot that list, or, if you're 2008 or above, maybe a [table valued parameter][2]. Either one is going to work better. Next is this bit: CONVERT(DATE, z.date_range) = CONVERT(DATE, t.departure_date) You've just put functions on columns so that you're guaranteed scans on those two tables. Maybe it's just range scans, but it'll be scans none the less. I'd suggest changing the data types for those values so that they are dates, or, barring that, adding a calculated column to store the data in the correct form so that you don't have to do conversions. [1]:
http://www.sqlservercentral.com/articles/Tally+Table/72993/ [2]:
http://technet.microsoft.com/en-us/library/bb510489.aspx

Of course! There is always a method to my madness. The SELECT DISTINCT is to retrieve the complete list of aircraft that I have to do the 30 day evaluation on. The list of aircraft is unknown and always changing as new "flights" arrive in the list. I only added the WHERE clause to reduce the result set as I was figuring this out. In production it would just be "SELECT DISTINCT t.n_number FROM t_flight t" As for the CONVERTs. You're right on the z.date_range..I've already got the tally / calendar table as a DATE...so that CONVERT is unnecessary. However the t_flight table, that departure_date is really a DATETIME...the time the actual flight departed...so I do need to reduce it to just a DATE.

I'd still look to eliminate the DISTINCT. It's just going to slow things down. Same thing with the conversion. I get it that it's a DATETIME, but that conversion won't help you.

Copyright 2019 Redgate Software. Privacy Policy

- Anonymous
- Sign in
- Create
- Ask a question
- Spaces
- Ask SSC Site Issues (meta-askssc)
- Explore
- Topics
- Questions
- Users
- Badges