x

Help with complicated join condition

Greetings,

I'm having difficulty getting the desired results on a particular join, so I thought I'd post this question here

The query I'm running is as follows:

 SELECT
 adSched.adScheduleNo ,
 adSched.adNo ,
 adSched.zoneNo ,
 adSched.positionNo ,
 adSched.slotGroupNo ,
 TopAdImpressions.maxImpressionDateTime
 FROM
 core.dbo.adSchedule adSched
 LEFT JOIN ( SELECT
             MAX(impressionDateTime) maxImpressionDateTime ,
             adScheduleNo
         FROM
             traffic.dbo.adImpressions adImpressions
         GROUP BY
             adScheduleNo
       ) TopAdImpressions
  ON  TopAdImpressions.adScheduleNo = adSched.adScheduleNo

Results - Sample 1:

alt text

Results - Sample 2:

alt text

Results - Sample 3:

alt text

This is going to be hard to explain, but I'll do my best.

My difficulty is this: for a given zoneNo and positionNo, it is possible to have repeated values (see the first three records in each of the data samples above). In these cases, the slotGroupNo will have a common value (in this case, 50).

I only want to return ONE record for a given zoneNo and positionNo. The determining factor is the "maxImpressionDateTime". There may or may not be a value available, as shown in the three data samples. If there is a value in one or more record, then I only want the record with the latest value. If there is no impressionDateTime value for any of the records in the slotGroup, then I simply want the record with the lowest adScheduleNo.

So, based on the data samples above, in sample 1, I only need to see records 1,4,5 and 6. In sample 2, I would want records 2,4,5 and 6 (record 2 having the latest impressionDateTime value). In sample 3, I would want records 3,4,5 and 6 (record 3 having the latest impressionDateTime value).

So, the big question is, how can I get a query to do this? I've tried a few things, including a couple of self-joins to see if I could filter things, but it's made more complicated by the fact that I might be using the impressionDateTime in some cases, and the adScheduleNo in others. I'm stumped!

Any thoughts? I can explain further if needed.

Thanks!

more ▼

asked Apr 14, 2011 at 07:55 AM in Default

avatar image

Bill
206 6 11 12

(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

This should do it assuming that you have SQL Server 2005 or better. The idea is to select the records from traffic.dbo.adImpressions table such that the returned results are sub-numbered on adScheduleNo basis placing the latest record to the top (RowNumber = 1). Because you mentioned that it is possible that some core.dbo.adSchedule records do not have a match in traffic.dbo.adImpressions table, the left join in combination with RecordNumber = 1 in the ON clause should be used.

 select 
     adSched.adScheduleNo, adSched.adNo, adSched.zoneNo, 
     adSched.positionNo, adSched.slotGroupNo, 
     TopAdImpressions.maxImpressionDateTime
     from core.dbo.adSchedule adSched
     left join
     (
         select
             adScheduleNo, 
             impressionDateTime as maxImpressionDateTime,
             row_number() over (partition by adScheduleNo 
                 order by impressionDateTime desc) RowNumber
             from 
             traffic.dbo.adImpressions
     ) TopAdImpressions 
         on TopAdImpressions.adScheduleNo = adSched.adScheduleNo 
         and TopAdImpressions.RowNumber = 1;

Oleg

more ▼

answered Apr 14, 2011 at 08:07 AM

avatar image

Oleg
17.2k 3 7 28

Thanks Oleg! Version two works perfectly! And I think I even understand what you did... :)

Much appreciated.

Bill

Apr 14, 2011 at 12:11 PM Bill
(comments are locked)
10|1200 characters needed characters left

Thanks Oleg,

I'm going to have to post my comments in the "answer" field, in order to properly format my comments (and to give me more space). Hopefully answering my own question won't close this to other answers. (I'll find out soon!)

That helps, but doesn't quite solve the issue. First off, the adScheduleNo changes in each of those first three records, so partitioning on that value doesn't help - each row ends up with a row_number of 1 (each adScheduleRow for which there is one or more matching adImpressions row(s)).

Example: If I have an adImpressions row for each of the three adSchedule rows as above (data sample 3), then this row_number join returns:

 92    2011-04-07 11:54:49.923    1
 93    2011-04-08 23:54:49.923    1
 97    2011-04-09 09:30:49.927    1

I can solve this by changing the partition to partition based on the slotGroupNo instead, but that's not stored in the adImpressions table. I might add that column to help. For now I've just modified that part of the query to include slotGroupNo via a join:

 select
   adImp.adScheduleNo, 
   impressionDateTime as maxImpressionDateTime,
   adSched2.slotGroupNo,
   row_number() over (partition by slotGroupNo
   order by impressionDateTime desc) RowNumber
   from 
   traffic.dbo.adImpressions adImp 
   inner join core.dbo.adSchedule adSched2 
   on adSched2.adScheduleNo = adImp.adScheduleNo

This results in the following data for the impressionDateTime:

 97    2011-04-09 09:30:49.927    50    1
 93    2011-04-08 23:54:49.923    50    2
 92    2011-04-07 11:54:49.923    50    3

This seems better, but it still doesn't eliminate any of the extra rows from my final result set (due to the left join). What I end up with is the following (based on the data in sample data 3 above):

alt text

The impressionDateTime value is only showing on record 3 (adScheduleNo 97), which is correct, based on the join we did. However, the first two rows are still there, with null impressionDateTime values. Again, due to the left join.
If I use an inner join, it works very nicely, but then I lose all records for which there are no adImpressions records.

I like the use of row_number() with the partition statement. I hadn't used that before, and can see some useful possibilities there for other queries. I love learning something new. :)

Thanks again Oleg!

Bill

more ▼

answered Apr 14, 2011 at 08:54 AM

avatar image

Bill
206 6 11 12

@Bill I updated my answer. Please let me know if the latest version of it works.

Apr 14, 2011 at 09:54 AM Oleg
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1069
x153

asked: Apr 14, 2011 at 07:55 AM

Seen: 1211 times

Last Updated: Apr 14, 2011 at 07:58 AM

Copyright 2016 Redgate Software. Privacy Policy