I'm having difficulty getting the desired results on a particular join, so I thought I'd post this question here
The query I'm running is as follows:
Results - Sample 1:
Results - Sample 2:
Results - Sample 3:
This is going to be hard to explain, but I'll do my best.
My difficulty is this: for a given zoneNo and positionNo, it is possible to have repeated values (see the first three records in each of the data samples above). In these cases, the slotGroupNo will have a common value (in this case, 50).
I only want to return ONE record for a given zoneNo and positionNo. The determining factor is the "maxImpressionDateTime". There may or may not be a value available, as shown in the three data samples. If there is a value in one or more record, then I only want the record with the latest value. If there is no impressionDateTime value for any of the records in the slotGroup, then I simply want the record with the lowest adScheduleNo.
So, based on the data samples above, in sample 1, I only need to see records 1,4,5 and 6. In sample 2, I would want records 2,4,5 and 6 (record 2 having the latest impressionDateTime value). In sample 3, I would want records 3,4,5 and 6 (record 3 having the latest impressionDateTime value).
So, the big question is, how can I get a query to do this? I've tried a few things, including a couple of self-joins to see if I could filter things, but it's made more complicated by the fact that I might be using the impressionDateTime in some cases, and the adScheduleNo in others. I'm stumped!
Any thoughts? I can explain further if needed.
This should do it assuming that you have SQL Server 2005 or better. The idea is to select the records from traffic.dbo.adImpressions table such that the returned results are sub-numbered on adScheduleNo basis placing the latest record to the top (RowNumber = 1). Because you mentioned that it is possible that some core.dbo.adSchedule records do not have a match in traffic.dbo.adImpressions table, the left join in combination with RecordNumber = 1 in the ON clause should be used.
I'm going to have to post my comments in the "answer" field, in order to properly format my comments (and to give me more space). Hopefully answering my own question won't close this to other answers. (I'll find out soon!)
That helps, but doesn't quite solve the issue. First off, the adScheduleNo changes in each of those first three records, so partitioning on that value doesn't help - each row ends up with a row_number of 1 (each adScheduleRow for which there is one or more matching adImpressions row(s)).
Example: If I have an adImpressions row for each of the three adSchedule rows as above (data sample 3), then this row_number join returns:
I can solve this by changing the partition to partition based on the slotGroupNo instead, but that's not stored in the adImpressions table. I might add that column to help. For now I've just modified that part of the query to include slotGroupNo via a join:
This results in the following data for the impressionDateTime:
This seems better, but it still doesn't eliminate any of the extra rows from my final result set (due to the left join). What I end up with is the following (based on the data in sample data 3 above):
The impressionDateTime value is only showing on record 3 (adScheduleNo 97), which is correct, based on the join we did. However, the first two rows are still there, with null impressionDateTime values. Again, due to the left join.
I like the use of row_number() with the partition statement. I hadn't used that before, and can see some useful possibilities there for other queries. I love learning something new. :)
Thanks again Oleg!
answered Apr 14, 2011 at 08:54 AM