x

Not DISTINCT

I have a table with multiple access entries - within a 10 seconds of each other - need to group these so only the first customer ID is shown in a report with the access time - how do you group / where if you have multiple adjacent entries, the custome rcould access a number of times within the same day/hour.

more ▼

asked Nov 02, 2011 at 09:00 AM in Default

avatar image

Alisdair
13 1 1 3

can you give us some sample data to work from please?

Nov 09, 2011 at 06:55 AM Fatherjack ♦♦

are the rows that you want to exclude always within 10s of the first?

Nov 09, 2011 at 07:31 AM Fatherjack ♦♦

30 seconds to be safe..

 INSERT INTO [tmp_AccessLog_Print]
        ([DateTime]
        ,[Camera]
        ,[NumberPlate]
        ,[Description]
        ,[ImageRef])
  VALUES
        (
       '2011-10-28 06:24:34.430',
       'NGOutF',
       'NR13988',
       '720,DAVIS ADRIENNE,PIGEON ROAD 7,,,NR 13988,HYUNDAI GETZ,LIGHT BLUE',
       'E:\Test Images\ASD Amber Valley\Test1\ReadImage\2011-10-27\ASD1_11-10-19_07-03-15-23_594.JPG'
       )

not too sure if the format suits..

 ID    DateTime    Camera    NumberPlate    Description    ImageRef
 6744    2011-10-28 06:24:34.430    NGOutF    NR13988    720,DAVIS ADRIENNE,PIGEON ROAD 7,,,NR 13988,HYUNDAI GETZ,LIGHT BLUE    E:\Test Images\ASD Amber Valley\Test1\ReadImage\2011-10-27\ASD1_11-10-19_07-03-15-23_594.JPG        
 6743    2011-10-28 06:24:39.670    NGOutF    NR13988    720,DAVIS ADRIENNE,PIGEON ROAD 7,,,NR 13988,HYUNDAI GETZ,LIGHT BLUE    E:\Test Images\ASD Amber Valley\Test1\ReadImage\2011-10-27\ASD1_11-10-19_07-03-15-23_594.JPG
Nov 09, 2011 at 08:00 AM Alisdair

the ID column in you second sample isnt mentioned in you INSERT, is that an identity column? Is this the Customer ID mentioned in your question?

Also, just to be clear, you dont want to have rows where they occur within 30s of a previous one

Nov 09, 2011 at 08:26 AM Fatherjack ♦♦

The ID is an identity column. The NumberPlate is used as the CustID, this can be repeated many times within the 30s and we just want the first, the Customer (NumberPlate) can occur later at each access time and thus must be extracted.

Nov 09, 2011 at 11:42 PM Alisdair
(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

A different approach....

Blatantly stealing @Fatherjack's test setup script and modifying it slightly (the original script didn't give enough different data)

 IF OBJECT_ID('tempdb..#timebookings') > 0 
 DROP TABLE #timebookings 
 go
 CREATE TABLE #timebookings ( CustId INT , AccessTime DATETIME ) 
 
 WHILE ( SELECT COUNT(*) FROM [#timebookings] ) < 2500
 BEGIN 
 SET NOCOUNT ON 
 INSERT INTO #timebookings 
 SELECT ABS(CHECKSUM(NEWID())) % 100 + 1 , 
 DATEADD(ss, CONVERT(INT, ABS(CHECKSUM(NEWID())) % 36000 + 1), '20111109 08:30:00') 
 
 SET NOCOUNT OFF 
 END ; 
 
 
 select CustID, AccessTime
 from #timebookings t1
 
 except
 
 select t2.CustID, t2.AccessTime 
 from #timebookings t1
 join #timebookings t2 on t1.custid = t2.custid  
                         and t2.AccessTime < dateadd(hh, 1, t1.AccessTime)
                         and t1.AccessTime < t2.AccessTime
 
 order by CustID, AccessTime


This uses the EXCEPT operator to get the set of AccessTimes that are not within the second set of AccessTimes defined as ones that are within an hour of another one.

Runs subsecond against base data of 10k rows.

more ▼

answered Nov 10, 2011 at 01:49 AM

avatar image

Kev Riley ♦♦
66.1k 48 63 81

Hi Kev, Many thanks that works a treat and is exceptionally fast!

Nov 10, 2011 at 02:27 AM Alisdair

You cant steal what was given to you. Glad its resolved.

@Alisdair, if this has fixed your problem, please tick Kevs answer as right so others can benefit from knowing what helped fix you up.

Nov 10, 2011 at 02:30 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left

Just been playing around and came up with this.

 DECLARE @TimeBookings AS TABLE (CustId int, AccessTime datetime)
 
 INSERT INTO @TimeBookings 
 SELECT 1, '20111102 16:37:10' UNION ALL 
 SELECT 1, '20111102 16:37:27' UNION ALL 
 SELECT 1, '20111102 17:59:00' UNION ALL 
 SELECT 1, '20111102 18:05:00' UNION ALL 
 SELECT 1, '20111102 19:00:00' UNION ALL 
 SELECT 1, '20111102 21:00:00'
 
 INSERT INTO @TimeBookings 
 SELECT 2, '20111102 18:15:00' UNION ALL 
 SELECT 2, '20111102 19:00:00' UNION ALL 
 SELECT 2, '20111102 21:00:00'
 
 ;with OrderedTimeBookings
 as(
  select *, row_number() over (partition by CustID order by AccessTime) as CustIndx
  from @TimeBookings
 )
 ,cte
 as( 
     select custid, AccessTime as FirstTime, CustIndx
     from OrderedTimeBookings 
 union all
     select
         cte.custid,
         OTB.AccessTime,
         OTB.CustIndx
     from cte
     join OrderedTimeBookings OTB 
      on cte.custid = OTB.custid 
      and OTB.CustIndx = cte.CustIndx + 1
      and OTB.AccessTime < dateadd(hh, 1, cte.FirstTime)
 )
 
 select custid, firsttime
 from cte
 group by custid, firsttime
 having count(*) =1
 order by custid, FirstTime


It uses a recursive CTE to build all the possible combinations of accesstimes that are within an hour of each other. Then by counting up the combinations, based on the 'first' access time and discarding any times that appear more than once (using having count(*)=1) you get a set of times that are unique (within a 60 minute window).

I used a row_number() windowing function over the original data, but this could be removed if access times where guaranteed to be unique.

This gives

 custid      firsttime
 ----------- -----------------------
 1           2011-11-02 16:37:10.000
 1           2011-11-02 17:59:00.000
 1           2011-11-02 21:00:00.000
 2           2011-11-02 18:15:00.000
 2           2011-11-02 21:00:00.000
more ▼

answered Nov 03, 2011 at 05:22 AM

avatar image

Kev Riley ♦♦
66.1k 48 63 81

That solved the problem, many thnaks for the solution, I was also inspired by your Access Time Blog of some time back.

Nov 03, 2011 at 05:53 AM Alisdair

The windowing function is VERY slow and restricts the usability of the sp, I am not too sure how this works without the function, the access times will be unique.

Nov 04, 2011 at 01:00 AM Alisdair

How slow is slow?

In your table are your partitioning and ordering by columns that are indexed?

Nov 04, 2011 at 01:20 AM Kev Riley ♦♦

Without the windowing function.....

 ;with cte
 as( 
     select custid, AccessTime as FirstTime
     from @TimeBookings 
 union all
     select
         cte.custid,
         OTB.AccessTime
     from cte
     join @TimeBookings OTB 
      on cte.custid = OTB.custid 
      and OTB.AccessTime < dateadd(hh, 1, cte.FirstTime)
      and OTB.AccessTime > cte.FirstTime
 ) 
 
 select custid, firsttime
 from cte
 group by custid, firsttime
 having count(*) =1
 order by custid, FirstTime
Nov 04, 2011 at 02:10 AM Kev Riley ♦♦

yes the columns are indexed, this does decrease the query. Existing table 2262 rows, result say 96 rows takes 30 seconds SQL 2008 R2 on dual core..

Nov 09, 2011 at 03:44 AM Alisdair
(comments are locked)
10|1200 characters needed characters left

You could always remove the insignificant portion of the datetime field you are referencing.

Example (strip time portion out of column, to return the day only)

 DECLARE @TimeBookings AS TABLE (CustId int, AccessTime datetime)

 INSERT INTO @TimeBookings 
 SELECT 1, '20111102 16:37:10' UNION ALL 
 SELECT 1, '20111102 16:37:27'
 
 SELECT CustId,
        CAST(CONVERT(char(10),AccessTime,112) AS datetime) AccessTime
 FROM @TimeBookings 
 GROUP BY CustId,
          CAST(CONVERT(char(10),AccessTime,112) AS datetime)


EDIT:

If you are on SQL Server 2008 and above, you can use the date datatype to remove the time portion. Swap out the clumsy CAST and CONVERT and do a simple CAST(AccessTime as date)

more ▼

answered Nov 02, 2011 at 09:43 AM

avatar image

WilliamD
26.2k 18 35 48

@WilliamD - the problem there is what if the log entries cross a boundary? eg 2011-10-02 16:39:59 and 2011-10-02 16:40:01? Chances are these are related, but grouping by hh:mm won't cover it. Similarly, 2011-10-01 23:59:59 and 2011-10-02 00:00:01 - same transaction, but would generate two entries in a summary table...

I'm sure I've seen a solution to this somewhere, but blown if I can remember where!

Nov 02, 2011 at 10:05 AM ThomasRushton ♦♦

right, valid point. I was concentrating on the final sentence in the OP. "multiple entries inside an hour/day".

The way to do it in terms of 10 seconds of each other would be a little more difficult. Depends on if the OP would be happy with rounding the times off to the nearest 10 seconds, or really paying attention to the exact timestamp plus-minus 10 seconds.

Nov 02, 2011 at 10:37 AM WilliamD

Thomas is correct, this is my problem and not easy to resolve. The requirement would be to select the first, ignor dulpicates only if in subsequent rows, but I am select (TSQL) challenged here and don't want to do it programatically:)

Nov 02, 2011 at 11:06 PM Alisdair
(comments are locked)
10|1200 characters needed characters left

create table #TimeBookings (CustId int, AccessTime datetime)

INSERT INTO #TimeBookings SELECT 1, '20111102 19:37:10' UNION ALL SELECT 2, '20111102 19:37:27'

with cte as (select ,row_number() over(partition by CustId, CustId order by AccessTime) as rowcnt from #TimeBookings) select from cte where rowcnt=1

---this will give you just the first row inserted for your custID without the reftime... let me know if i understand it wrong... :(

more ▼

answered Nov 03, 2011 at 12:17 AM

avatar image

Sri 1
11 9 9 11

yes almost this now provides the correct items but CustID can access more than once a day and this removes all subsequent accesses - See CustID as BadgeNumber to avoid confusion here

Nov 03, 2011 at 12:53 AM Alisdair

See CustID as BadgeNumber ..????? this is more confusing... :(

Nov 03, 2011 at 05:47 AM Sri 1
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x88

asked: Nov 02, 2011 at 09:00 AM

Seen: 1228 times

Last Updated: Nov 02, 2011 at 09:00 AM

Copyright 2017 Redgate Software. Privacy Policy