question

shaffer911 avatar image
shaffer911 asked

Complex LEFT OUTER JOIN or OUTER APPLY / NULL problem

Afternoon All, Help me check my thinking on this one. And please pardon then lack CREATE TABLES…I’ll post those when I get back to the office. I have a table of daily flight activity for airplanes across the country, and I need to figure out for the preceding 30 days how many flights each airplane has taken. Obviously for some airplanes the answer on any given day is zero flights occurred. First, I setup a calendar / tally table to hold my dates. I only have 6 dates in my calendar table just to facilitate the R&D aspects… And started out with a LEFT OUTER JOIN SELECT CONVERT(DATE, z.date_range) flight_date , t.n_number , COUNT(t.n_number) flights FROM z_date_stub z LEFT OUTER JOIN t_flight t ON CONVERT(DATE, z.date_range) = CONVERT(DATE, t.departure_date) AND t.n_number IN ( 'N983JC' ) GROUP BY t.n_number , CONVERT(DATE, z.date_range) ORDER BY CONVERT(DATE, z.date_range) Note – “AND t.n_number IN ('N983JC'))” was just to constrain the dataset while I work the kinks out. Which on its face performed well and returned a result set that was close. flight_date n_number flights 2014-02-26 N983JC 9 2014-02-27 N983JC 9 2014-02-28 N983JC 2 2014-03-01 NULL 0 2014-03-02 NULL 0 2014-03-03 NULL 0 The only problem is I really need that n_number column to hold the n_number and not NULL. (Think about it…if I released the constraint to all n_numbers any airplanes without flights on a particular day would group into the NULL column) So then I just started brainstormin, and pondered OUTER APPLY. Initially ran into the same problem of the NULL results, and settled on two OUTER APPLYS. SELECT CONVERT(DATE, z.date_range) flight_date , x.n_number , COUNT(CONVERT(DATE, y.departure_date)) flights FROM z_date_stub z OUTER APPLY ( SELECT DISTINCT t.n_number FROM t_flight t WHERE t.n_number IN ( 'N983JC' ) ) x OUTER APPLY ( SELECT t.departure_date , t.n_number FROM t_flight t WHERE CONVERT(DATE, z.date_range) = CONVERT(DATE, t.departure_date) AND t.n_number = x.n_number ) y GROUP BY z.date_range , x.n_number , y.n_number ORDER BY x.n_number , z.date_range And BAM, this is the result set: flight_date n_number flights 2014-02-26 N983JC 9 2014-02-27 N983JC 9 2014-02-28 N983JC 2 2014-03-01 N983JC 0 2014-03-02 N983JC 0 2014-03-03 N983JC 0 It’s the data I’m expecting, and performs well for the entire dataset (68,000 planes / 103,000 flights / 1.9M rows in the result set / 22 seconds), but it’s got me wondering if this is the best way to write this. Thoughts? Alternatives? Thanks, Steve
query-tuningouter-join
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

when you got a chance to post the CREATE TABLE ... please also post some sample data and expected result
0 Likes 0 ·
This site runs off of votes. Please mark all helpful answers by clicking on the thumbs up next to those answers. If any one answer lead to a solution, please indicate that by clicking on the check mark next to that answer.
0 Likes 0 ·
rsheik avatar image
rsheik answered
Hi Steve... understood what you have done... im thinking of doing the same in easier way.. Can we have sample of ur table????
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered
I'm not sure about the logic, but a few things stand out from a performance stand point. First, this query: SELECT DISTINCT t.n_number FROM t_flight t WHERE t.n_number IN ( 'N983JC' ) DISTINCT is an aggregate function that's going to use up a lot of resources aggregating values. I get that you may have a list of values instead of just one, but I'd look to using a [tally table][1] to pivot that list, or, if you're 2008 or above, maybe a [table valued parameter][2]. Either one is going to work better. Next is this bit: CONVERT(DATE, z.date_range) = CONVERT(DATE, t.departure_date) You've just put functions on columns so that you're guaranteed scans on those two tables. Maybe it's just range scans, but it'll be scans none the less. I'd suggest changing the data types for those values so that they are dates, or, barring that, adding a calculated column to store the data in the correct form so that you don't have to do conversions. [1]: http://www.sqlservercentral.com/articles/Tally+Table/72993/ [2]: http://technet.microsoft.com/en-us/library/bb510489.aspx
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Of course! There is always a method to my madness. The SELECT DISTINCT is to retrieve the complete list of aircraft that I have to do the 30 day evaluation on. The list of aircraft is unknown and always changing as new "flights" arrive in the list. I only added the WHERE clause to reduce the result set as I was figuring this out. In production it would just be "SELECT DISTINCT t.n_number FROM t_flight t" As for the CONVERTs. You're right on the z.date_range..I've already got the tally / calendar table as a DATE...so that CONVERT is unnecessary. However the t_flight table, that departure_date is really a DATETIME...the time the actual flight departed...so I do need to reduce it to just a DATE.
0 Likes 0 ·
I'd still look to eliminate the DISTINCT. It's just going to slow things down. Same thing with the conversion. I get it that it's a DATETIME, but that conversion won't help you.
0 Likes 0 ·
iainrobertson avatar image
iainrobertson answered
If you cross join your date range of interest with your list of planes you can then outer apply the counts just once: -- create test data create table #Planes (PlaneId varchar(7)) insert #Planes values ('N21221') insert #Planes values ('N21231') insert #Planes values ('N24421') insert #Planes values ('A27821') insert #Planes values ('N21266') create table #Flights (PlaneId varchar(7), DateId int) insert #Flights values ('N21221',1) insert #Flights values ('N21231',1) insert #Flights values ('N24421',1) insert #Flights values ('A27821',2) insert #Flights values ('N21221',3) insert #Flights values ('N21231',1) insert #Flights values ('N24421',3) insert #Flights values ('A27821',3) insert #Flights values ('N21266',1) insert #Flights values ('N21221',2) insert #Flights values ('N21231',2) insert #Flights values ('N24421',1) insert #Flights values ('A27821',3) insert #Flights values ('N21266',3) create table #Calendar (DateId int, Date datetime) insert #Calendar values (1,'1 July 2013') insert #Calendar values (2,'2 July 2013') insert #Calendar values (3,'3 July 2013') -- get the data ; with cte as ( select PlaneId, DateId, Date from #Planes cross join #Calendar ) select cte.PlaneId, cte.Date, isnull(fc.FlightCount,0) from cte outer apply ( select PlaneId, DateId, FlightCount = count(*) from #Flights where PlaneId = cte.PlaneId and DateId = cte.DateId group by PlaneId, DateId ) fc order by PlaneId, cte.DateId
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.