x

Help with complicated join condition

Greetings,

I'm having difficulty getting the desired results on a particular join, so I thought I'd post this question here

The query I'm running is as follows:

SELECT
adSched.adScheduleNo ,
adSched.adNo ,
adSched.zoneNo ,
adSched.positionNo ,
adSched.slotGroupNo ,
TopAdImpressions.maxImpressionDateTime
FROM
core.dbo.adSchedule adSched
LEFT JOIN ( SELECT
            MAX(impressionDateTime) maxImpressionDateTime ,
            adScheduleNo
        FROM
            traffic.dbo.adImpressions adImpressions
        GROUP BY
            adScheduleNo
      ) TopAdImpressions
 ON  TopAdImpressions.adScheduleNo = adSched.adScheduleNo

Results - Sample 1:

alt text

Results - Sample 2:

alt text

Results - Sample 3:

![alt text][3]

This is going to be hard to explain, but I'll do my best.

My difficulty is this: for a given zoneNo and positionNo, it is possible to have repeated values (see the first three records in each of the data samples above). In these cases, the slotGroupNo will have a common value (in this case, 50).

I only want to return ONE record for a given zoneNo and positionNo. The determining factor is the "maxImpressionDateTime". There may or may not be a value available, as shown in the three data samples. If there is a value in one or more record, then I only want the record with the latest value. If there is no impressionDateTime value for any of the records in the slotGroup, then I simply want the record with the lowest adScheduleNo.

So, based on the data samples above, in sample 1, I only need to see records 1,4,5 and 6. In sample 2, I would want records 2,4,5 and 6 (record 2 having the latest impressionDateTime value). In sample 3, I would want records 3,4,5 and 6 (record 3 having the latest impressionDateTime value).

So, the big question is, how can I get a query to do this? I've tried a few things, including a couple of self-joins to see if I could filter things, but it's made more complicated by the fact that I might be using the impressionDateTime in some cases, and the adScheduleNo in others. I'm stumped!

Any thoughts? I can explain further if needed.

Thanks!

[3]: /upfiles/sqlData3_1.png
more ▼

asked Apr 14, 2011 at 07:55 AM in Default

Bill gravatar image

Bill
105 6 6 7

(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

This should do it assuming that you have SQL Server 2005 or better. The idea is to select the records from traffic.dbo.adImpressions table such that the returned results are sub-numbered on adScheduleNo basis placing the latest record to the top (RowNumber = 1). Because you mentioned that it is possible that some core.dbo.adSchedule records do not have a match in traffic.dbo.adImpressions table, the left join in combination with RecordNumber = 1 in the ON clause should be used.

select 
    adSched.adScheduleNo, adSched.adNo, adSched.zoneNo, 
    adSched.positionNo, adSched.slotGroupNo, 
    TopAdImpressions.maxImpressionDateTime
    from core.dbo.adSchedule adSched
    left join
    (
        select
            adScheduleNo, 
            impressionDateTime as maxImpressionDateTime,
            row_number() over (partition by adScheduleNo 
                order by impressionDateTime desc) RowNumber
            from 
            traffic.dbo.adImpressions
    ) TopAdImpressions 
        on TopAdImpressions.adScheduleNo = adSched.adScheduleNo 
        and TopAdImpressions.RowNumber = 1;

<!-- Begin Edit

Bill,

Based on your comments, I believe that the query should be restated like this to get the data you need:

;with records as
(
    select 
        adSched.adScheduleNo, adSched.adNo, adSched.zoneNo, 
        adSched.positionNo, adSched.slotGroupNo, 
        TopAdImpressions.maxImpressionDateTime,
        row_number() over (partition by ZoneNo, positionNo 
            order by TopAdImpressions.maxImpressionDateTime desc) RowNumber
        from core.dbo.adSchedule adSched
        outer apply
        (
            select
                max(impressionDateTime) as maxImpressionDateTime
                from traffic.dbo.adImpressions
                where adScheduleNo = adSched.adScheduleNo
        ) TopAdImpressions
)
    select
        adScheduleNo, adNo, zoneNo, positionNo, 
        slotGroupNo, maxImpressionDateTime
        from records
        where RowNumber = 1;

Please let me know if this works

End Edit -->

Oleg
more ▼

answered Apr 14, 2011 at 08:07 AM

Oleg gravatar image

Oleg
15.9k 2 4 24

Thanks Oleg! Version two works perfectly! And I think I even understand what you did... :)

Much appreciated.

Bill
Apr 14, 2011 at 12:11 PM Bill
(comments are locked)
10|1200 characters needed characters left

Thanks Oleg,

I'm going to have to post my comments in the "answer" field, in order to properly format my comments (and to give me more space). Hopefully answering my own question won't close this to other answers. (I'll find out soon!)

That helps, but doesn't quite solve the issue. First off, the adScheduleNo changes in each of those first three records, so partitioning on that value doesn't help - each row ends up with a row_number of 1 (each adScheduleRow for which there is one or more matching adImpressions row(s)).

Example: If I have an adImpressions row for each of the three adSchedule rows as above (data sample 3), then this row_number join returns:

92  2011-04-07 11:54:49.923  1
93  2011-04-08 23:54:49.923  1
97  2011-04-09 09:30:49.927  1

I can solve this by changing the partition to partition based on the slotGroupNo instead, but that's not stored in the adImpressions table. I might add that column to help. For now I've just modified that part of the query to include slotGroupNo via a join:

select
  adImp.adScheduleNo, 
  impressionDateTime as maxImpressionDateTime,
  adSched2.slotGroupNo,
  row_number() over (partition by slotGroupNo
  order by impressionDateTime desc) RowNumber
  from 
  traffic.dbo.adImpressions adImp 
  inner join core.dbo.adSchedule adSched2 
  on adSched2.adScheduleNo = adImp.adScheduleNo

This results in the following data for the impressionDateTime:

97  2011-04-09 09:30:49.927  50   1
93  2011-04-08 23:54:49.923  50   2
92  2011-04-07 11:54:49.923  50   3

This seems better, but it still doesn't eliminate any of the extra rows from my final result set (due to the left join). What I end up with is the following (based on the data in sample data 3 above):

alt text

The impressionDateTime value is only showing on record 3 (adScheduleNo 97), which is correct, based on the join we did. However, the first two rows are still there, with null impressionDateTime values. Again, due to the left join.
If I use an inner join, it works very nicely, but then I lose all records for which there are no adImpressions records.

I like the use of row_number() with the partition statement. I hadn't used that before, and can see some useful possibilities there for other queries. I love learning something new. :)

Thanks again Oleg!

Bill

more ▼

answered Apr 14, 2011 at 08:54 AM

Bill gravatar image

Bill
105 6 6 7

@Bill I updated my answer. Please let me know if the latest version of it works.
Apr 14, 2011 at 09:54 AM Oleg
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x987
x118

asked: Apr 14, 2011 at 07:55 AM

Seen: 1007 times

Last Updated: Apr 14, 2011 at 07:58 AM