question

Alisdair avatar image
Alisdair asked

Not DISTINCT

I have a table with multiple access entries - within a 10 seconds of each other - need to group these so only the first customer ID is shown in a report with the access time - how do you group / where if you have multiple adjacent entries, the custome rcould access a number of times within the same day/hour.
datetime
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
can you give us some sample data to work from please?
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
are the rows that you want to exclude always within 10s of the first?
0 Likes 0 ·
Alisdair avatar image Alisdair commented ·
30 seconds to be safe.. INSERT INTO [tmp_AccessLog_Print] ([DateTime] ,[Camera] ,[NumberPlate] ,[Description] ,[ImageRef]) VALUES ( '2011-10-28 06:24:34.430', 'NGOutF', 'NR13988', '720,DAVIS ADRIENNE,PIGEON ROAD 7,,,NR 13988,HYUNDAI GETZ,LIGHT BLUE', 'E:\Test Images\ASD Amber Valley\Test1\ReadImage\2011-10-27\ASD1_11-10-19_07-03-15-23_594.JPG' ) not too sure if the format suits.. ID DateTime Camera NumberPlate Description ImageRef 6744 2011-10-28 06:24:34.430 NGOutF NR13988 720,DAVIS ADRIENNE,PIGEON ROAD 7,,,NR 13988,HYUNDAI GETZ,LIGHT BLUE E:\Test Images\ASD Amber Valley\Test1\ReadImage\2011-10-27\ASD1_11-10-19_07-03-15-23_594.JPG 6743 2011-10-28 06:24:39.670 NGOutF NR13988 720,DAVIS ADRIENNE,PIGEON ROAD 7,,,NR 13988,HYUNDAI GETZ,LIGHT BLUE E:\Test Images\ASD Amber Valley\Test1\ReadImage\2011-10-27\ASD1_11-10-19_07-03-15-23_594.JPG
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
the ID column in you second sample isnt mentioned in you INSERT, is that an identity column? Is this the Customer ID mentioned in your question? Also, just to be clear, you dont want to have rows where they occur within 30s of a previous one
0 Likes 0 ·
Alisdair avatar image Alisdair commented ·
The ID is an identity column. The NumberPlate is used as the CustID, this can be repeated many times within the 30s and we just want the first, the Customer (NumberPlate) can occur later at each access time and thus must be extracted.
0 Likes 0 ·
Kev Riley avatar image
Kev Riley answered
A different approach.... Blatantly stealing @Fatherjack's test setup script and modifying it slightly (the original script didn't give enough different data) IF OBJECT_ID('tempdb..#timebookings') > 0 DROP TABLE #timebookings go CREATE TABLE #timebookings ( CustId INT , AccessTime DATETIME ) WHILE ( SELECT COUNT(*) FROM [#timebookings] ) < 2500 BEGIN SET NOCOUNT ON INSERT INTO #timebookings SELECT ABS(CHECKSUM(NEWID())) % 100 + 1 , DATEADD(ss, CONVERT(INT, ABS(CHECKSUM(NEWID())) % 36000 + 1), '20111109 08:30:00') SET NOCOUNT OFF END ; select CustID, AccessTime from #timebookings t1 except select t2.CustID, t2.AccessTime from #timebookings t1 join #timebookings t2 on t1.custid = t2.custid and t2.AccessTime < dateadd(hh, 1, t1.AccessTime) and t1.AccessTime < t2.AccessTime order by CustID, AccessTime This uses the [EXCEPT][1] operator to get the set of AccessTimes that are not within the second set of AccessTimes defined as ones that are within an hour of another one. Runs subsecond against base data of 10k rows. [1]: http://msdn.microsoft.com/en-us/library/ms188055.aspx
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
You cant steal what was given to you. Glad its resolved. @Alisdair, if this has fixed your problem, please tick Kevs answer as right so others can benefit from knowing what helped fix you up.
1 Like 1 ·
Alisdair avatar image Alisdair commented ·
Hi Kev, Many thanks that works a treat and is exceptionally fast!
0 Likes 0 ·
WilliamD avatar image
WilliamD answered
You could always remove the insignificant portion of the datetime field you are referencing. Example (strip time portion out of column, to return the day only) DECLARE @TimeBookings AS TABLE (CustId int, AccessTime datetime) INSERT INTO @TimeBookings SELECT 1, '20111102 16:37:10' UNION ALL SELECT 1, '20111102 16:37:27' SELECT CustId, CAST(CONVERT(char(10),AccessTime,112) AS datetime) AccessTime FROM @TimeBookings GROUP BY CustId, CAST(CONVERT(char(10),AccessTime,112) AS datetime) EDIT: If you are on SQL Server 2008 and above, you can use the `date` datatype to remove the time portion. Swap out the clumsy `CAST` and `CONVERT` and do a simple `CAST(AccessTime as date)`
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image ThomasRushton ♦♦ commented ·
@WilliamD - the problem there is what if the log entries cross a boundary? eg 2011-10-02 16:39:59 and 2011-10-02 16:40:01? Chances are these are related, but grouping by hh:mm won't cover it. Similarly, 2011-10-01 23:59:59 and 2011-10-02 00:00:01 - same transaction, but would generate two entries in a summary table... I'm sure I've seen a solution to this somewhere, but blown if I can remember where!
0 Likes 0 ·
WilliamD avatar image WilliamD commented ·
right, valid point. I was concentrating on the final sentence in the OP. "multiple entries inside an hour/day". The way to do it in terms of 10 seconds of each other would be a little more difficult. Depends on if the OP would be happy with rounding the times off to the nearest 10 seconds, or really paying attention to the exact timestamp plus-minus 10 seconds.
0 Likes 0 ·
Alisdair avatar image Alisdair commented ·
Thomas is correct, this is my problem and not easy to resolve. The requirement would be to select the first, ignor dulpicates only if in subsequent rows, but I am select (TSQL) challenged here and don't want to do it programatically:)
0 Likes 0 ·
Sri 1 avatar image
Sri 1 answered
create table #TimeBookings (CustId int, AccessTime datetime) INSERT INTO #TimeBookings SELECT 1, '20111102 19:37:10' UNION ALL SELECT 2, '20111102 19:37:27' with cte as (select *,row_number() over(partition by CustId, CustId order by AccessTime) as rowcnt from #TimeBookings) select * from cte where rowcnt=1 ---this will give you just the first row inserted for your custID without the reftime... let me know if i understand it wrong... :(
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Alisdair avatar image Alisdair commented ·
yes almost this now provides the correct items but CustID can access more than once a day and this removes all subsequent accesses - See CustID as BadgeNumber to avoid confusion here
0 Likes 0 ·
Sri 1 avatar image Sri 1 commented ·
See CustID as BadgeNumber ..????? this is more confusing... :(
0 Likes 0 ·
Kev Riley avatar image
Kev Riley answered
Just been playing around and came up with this. DECLARE @TimeBookings AS TABLE (CustId int, AccessTime datetime) INSERT INTO @TimeBookings SELECT 1, '20111102 16:37:10' UNION ALL SELECT 1, '20111102 16:37:27' UNION ALL SELECT 1, '20111102 17:59:00' UNION ALL SELECT 1, '20111102 18:05:00' UNION ALL SELECT 1, '20111102 19:00:00' UNION ALL SELECT 1, '20111102 21:00:00' INSERT INTO @TimeBookings SELECT 2, '20111102 18:15:00' UNION ALL SELECT 2, '20111102 19:00:00' UNION ALL SELECT 2, '20111102 21:00:00' ;with OrderedTimeBookings as( select *, row_number() over (partition by CustID order by AccessTime) as CustIndx from @TimeBookings ) ,cte as( select custid, AccessTime as FirstTime, CustIndx from OrderedTimeBookings union all select cte.custid, OTB.AccessTime, OTB.CustIndx from cte join OrderedTimeBookings OTB on cte.custid = OTB.custid and OTB.CustIndx = cte.CustIndx + 1 and OTB.AccessTime < dateadd(hh, 1, cte.FirstTime) ) select custid, firsttime from cte group by custid, firsttime having count(*) =1 order by custid, FirstTime It uses a recursive CTE to build all the possible combinations of accesstimes that are within an hour of each other. Then by counting up the combinations, based on the 'first' access time and discarding any times that appear more than once (using `having count(*)=1`) you get a set of times that are unique (within a 60 minute window). I used a row_number() windowing function over the original data, but this could be removed if access times where guaranteed to be unique. This gives custid firsttime ----------- ----------------------- 1 2011-11-02 16:37:10.000 1 2011-11-02 17:59:00.000 1 2011-11-02 21:00:00.000 2 2011-11-02 18:15:00.000 2 2011-11-02 21:00:00.000
8 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Alisdair avatar image Alisdair commented ·
That solved the problem, many thnaks for the solution, I was also inspired by your Access Time Blog of some time back.
0 Likes 0 ·
Alisdair avatar image Alisdair commented ·
The windowing function is VERY slow and restricts the usability of the sp, I am not too sure how this works without the function, the access times will be unique.
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
How slow is slow? In your table are your partitioning and ordering by columns that are indexed?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Without the windowing function..... ;with cte as( select custid, AccessTime as FirstTime from @TimeBookings union all select cte.custid, OTB.AccessTime from cte join @TimeBookings OTB on cte.custid = OTB.custid and OTB.AccessTime < dateadd(hh, 1, cte.FirstTime) and OTB.AccessTime > cte.FirstTime ) select custid, firsttime from cte group by custid, firsttime having count(*) =1 order by custid, FirstTime
0 Likes 0 ·
Alisdair avatar image Alisdair commented ·
yes the columns are indexed, this does decrease the query. Existing table 2262 rows, result say 96 rows takes 30 seconds SQL 2008 R2 on dual core..
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.