question

Inspectigater avatar image
Inspectigater asked

Looking to filter a list of names with IDs separated by <50

Hello! I have a list of data (people). Each row has an ID. As it stands the list contains about 3,000 rows, so I'm looking to reduce this list to only duplicate people, and then reduce that further to only duplicate people that have their IDs created simultaneously (or close to it... within 50 increments is fine). I'm not really sure how I'd go about this. Here's the query I've been running with to get the list filtered down to just duplicates: Select * from People as P1 Join ( select count(*) as count, firstname, lastname from People where filteringcriteria = 'Y' group by firstname, lastname having count(*) > 1) as p2 on p1.lastname = p2.lastname and p1.firstname = p2.firstname where filteringcriteria = 'y' How would I then further filter this by ID where names are in close proximity (ID wise) to their duplicate? Thanks!!
filterduplicatesrow-counts
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image
ThomasRushton answered
Perhaps this might help in your JOIN criteria? ...AND ABS( p1.ID - p2.ID) < 50...
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

sdoubleday avatar image
sdoubleday answered
Hi Inspectigater, I would approach this by creating upper and lower bounds in the subselect p2 and using them to range join back to p1, as shown below. Hope this helps. ;WITH People AS ( /*Set up sample dataset with common table expression*/ SELECT * FROM (VALUES (1, 'bob', 'bobbob', 'n') /*Should be excluded*/ ,(2, 'tim', 'timtim', 'y') /*Should match id 49*/ ,(49, 'tim', 'timtim', 'y') /*Should match id 2*/ ,(29, 'sue', 'suesue', 'y') /*Should match id 31 AND 33*/ ,(31, 'sue', 'suesue', 'y') /*Should match id 29 AND 33*/ ,(33, 'sue', 'suesue', 'y') /*Should match id 29 AND 31 AND 82*/ ,(82, 'sue', 'suesue', 'y') /*Should match id 33*/ ,(1000, 'sue', 'suesue', 'y') /*Should match id 1001*/ ,(1001, 'sue', 'suesue', 'y') /*Should match id 1000*/ ,(2000, 'sue', 'suesue', 'y') /*Should be excluded - no other ids nearby*/ ,(7, 'albert', 'albertalbert', 'y') /*Should be excluded*/ ) AS People (id, firstname, lastname, filteringcriteria) ) Select P2.id AS LowerID , P1.id AS HigherID ,P1.firstname ,P1.lastname FROM ( /*Set up the filter based on having enough matches, also establish the upper and lower bounds on which you'd like to match*/ SELECT * ,COUNT(1) OVER (PARTITION BY FirstName, LastName) AS Count ,id - 50 AS lowerBound ,id + 50 AS upperBound FROM People WHERE People.filteringcriteria = 'Y' ) as P2 INNER JOIN People AS P1 ON p1.id >= P2.lowerBound AND P1.id <= P2.upperBound AND P1.id > P2.id /*This ensures each pair of ids only shows up together once.*/ AND P1.firstname = P2.firstname AND P1.lastname = P2.lastname WHERE P2.Count > 1 ORDER BY P2.id , P1.id
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.