question

Bill avatar image
Bill asked

Help with complicated join condition

Greetings, I'm having difficulty getting the desired results on a particular join, so I thought I'd post this question here The query I'm running is as follows: SELECT adSched.adScheduleNo , adSched.adNo , adSched.zoneNo , adSched.positionNo , adSched.slotGroupNo , TopAdImpressions.maxImpressionDateTime FROM core.dbo.adSchedule adSched LEFT JOIN ( SELECT MAX(impressionDateTime) maxImpressionDateTime , adScheduleNo FROM traffic.dbo.adImpressions adImpressions GROUP BY adScheduleNo ) TopAdImpressions ON TopAdImpressions.adScheduleNo = adSched.adScheduleNo Results - Sample 1: ![alt text][1] Results - Sample 2: ![alt text][2] Results - Sample 3: ![alt text][3] This is going to be hard to explain, but I'll do my best. My difficulty is this: for a given zoneNo and positionNo, it is possible to have repeated values (see the first three records in each of the data samples above). In these cases, the slotGroupNo will have a common value (in this case, 50). I only want to return ONE record for a given zoneNo and positionNo. The determining factor is the "maxImpressionDateTime". There may or may not be a value available, as shown in the three data samples. If there is a value in one or more record, then I only want the record with the latest value. If there is no impressionDateTime value for any of the records in the slotGroup, then I simply want the record with the lowest adScheduleNo. So, based on the data samples above, in sample 1, I only need to see records 1,4,5 and 6. In sample 2, I would want records 2,4,5 and 6 (record 2 having the latest impressionDateTime value). In sample 3, I would want records 3,4,5 and 6 (record 3 having the latest impressionDateTime value). So, the big question is, how can I get a query to do this? I've tried a few things, including a couple of self-joins to see if I could filter things, but it's made more complicated by the fact that I might be using the impressionDateTime in some cases, and the adScheduleNo in others. I'm stumped! Any thoughts? I can explain further if needed. Thanks! [1]: /upfiles/sqlData.png [2]: /upfiles/sqlData2_1.png [3]: /upfiles/sqlData3_1.png
t-sqljoins
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered
This should do it assuming that you have SQL Server 2005 or better. The idea is to select the records from traffic.dbo.adImpressions table such that the returned results are sub-numbered on adScheduleNo basis placing the latest record to the top (RowNumber = 1). Because you mentioned that it is possible that some core.dbo.adSchedule records do not have a match in traffic.dbo.adImpressions table, the left join in combination with RecordNumber = 1 in the ON clause should be used. select adSched.adScheduleNo, adSched.adNo, adSched.zoneNo, adSched.positionNo, adSched.slotGroupNo, TopAdImpressions.maxImpressionDateTime from core.dbo.adSchedule adSched left join ( select adScheduleNo, impressionDateTime as maxImpressionDateTime, row_number() over (partition by adScheduleNo order by impressionDateTime desc) RowNumber from traffic.dbo.adImpressions ) TopAdImpressions on TopAdImpressions.adScheduleNo = adSched.adScheduleNo and TopAdImpressions.RowNumber = 1; <\!-- **Begin Edit** Bill, Based on your comments, I believe that the query should be restated like this to get the data you need: ;with records as ( select adSched.adScheduleNo, adSched.adNo, adSched.zoneNo, adSched.positionNo, adSched.slotGroupNo, TopAdImpressions.maxImpressionDateTime, row_number() over (partition by ZoneNo, positionNo order by TopAdImpressions.maxImpressionDateTime desc) RowNumber from core.dbo.adSchedule adSched outer apply ( select max(impressionDateTime) as maxImpressionDateTime from traffic.dbo.adImpressions where adScheduleNo = adSched.adScheduleNo ) TopAdImpressions ) select adScheduleNo, adNo, zoneNo, positionNo, slotGroupNo, maxImpressionDateTime from records where RowNumber = 1; Please let me know if this works **End Edit** --> Oleg
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks Oleg! Version two works perfectly! And I think I even understand what you did... :) Much appreciated. Bill
1 Like 1 ·
Bill avatar image
Bill answered
Thanks Oleg, I'm going to have to post my comments in the "answer" field, in order to properly format my comments (and to give me more space). Hopefully answering my own question won't close this to other answers. (I'll find out soon!) That helps, but doesn't quite solve the issue. First off, the adScheduleNo changes in each of those first three records, so partitioning on that value doesn't help - each row ends up with a row_number of 1 (each adScheduleRow for which there is one or more matching adImpressions row(s)). Example: If I have an adImpressions row for each of the three adSchedule rows as above (data sample 3), then this row_number join returns: 92 2011-04-07 11:54:49.923 1 93 2011-04-08 23:54:49.923 1 97 2011-04-09 09:30:49.927 1 I can solve this by changing the partition to partition based on the slotGroupNo instead, but that's not stored in the adImpressions table. I might add that column to help. For now I've just modified that part of the query to include slotGroupNo via a join: select adImp.adScheduleNo, impressionDateTime as maxImpressionDateTime, adSched2.slotGroupNo, row_number() over (partition by slotGroupNo order by impressionDateTime desc) RowNumber from traffic.dbo.adImpressions adImp inner join core.dbo.adSchedule adSched2 on adSched2.adScheduleNo = adImp.adScheduleNo This results in the following data for the impressionDateTime: 97 2011-04-09 09:30:49.927 50 1 93 2011-04-08 23:54:49.923 50 2 92 2011-04-07 11:54:49.923 50 3 This seems better, but it still doesn't eliminate any of the extra rows from my final result set (due to the left join). What I end up with is the following (based on the data in sample data 3 above): ![alt text][1] [1]: /upfiles/sqlData4.png The impressionDateTime value is only showing on record 3 (adScheduleNo 97), which is correct, based on the join we did. However, the first two rows are still there, with null impressionDateTime values. Again, due to the left join. If I use an inner join, it works very nicely, but then I lose all records for which there are no adImpressions records. I like the use of row_number() with the partition statement. I hadn't used that before, and can see some useful possibilities there for other queries. I love learning something new. :) Thanks again Oleg! Bill
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Bill I updated my answer. Please let me know if the latest version of it works.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.