Looking to filter a list of names with IDs separated by <50
Hello! I have a list of data (people). Each row has an ID. As it stands the list contains about 3,000 rows, so I'm looking to reduce this list to only duplicate people, and then reduce that further to only duplicate people that have their IDs created simultaneously (or close to it... within 50 increments is fine). I'm not really sure how I'd go about this. Here's the query I've been running with to get the list filtered down to just duplicates: Select * from People as P1 Join ( select count(*) as count, firstname, lastname from People where filteringcriteria = 'Y' group by firstname, lastname having count(*) > 1) as p2 on p1.lastname = p2.lastname and p1.firstname = p2.firstname where filteringcriteria = 'y' How would I then further filter this by ID where names are in close proximity (ID wise) to their duplicate? Thanks!!
Hi Inspectigater, I would approach this by creating upper and lower bounds in the subselect p2 and using them to range join back to p1, as shown below. Hope this helps. ;WITH People AS ( /*Set up sample dataset with common table expression*/ SELECT * FROM (VALUES (1, 'bob', 'bobbob', 'n') /*Should be excluded*/ ,(2, 'tim', 'timtim', 'y') /*Should match id 49*/ ,(49, 'tim', 'timtim', 'y') /*Should match id 2*/ ,(29, 'sue', 'suesue', 'y') /*Should match id 31 AND 33*/ ,(31, 'sue', 'suesue', 'y') /*Should match id 29 AND 33*/ ,(33, 'sue', 'suesue', 'y') /*Should match id 29 AND 31 AND 82*/ ,(82, 'sue', 'suesue', 'y') /*Should match id 33*/ ,(1000, 'sue', 'suesue', 'y') /*Should match id 1001*/ ,(1001, 'sue', 'suesue', 'y') /*Should match id 1000*/ ,(2000, 'sue', 'suesue', 'y') /*Should be excluded - no other ids nearby*/ ,(7, 'albert', 'albertalbert', 'y') /*Should be excluded*/ ) AS People (id, firstname, lastname, filteringcriteria) ) Select
P2.id AS LowerID ,
P1.id AS HigherID ,P1.firstname ,P1.lastname FROM ( /*Set up the filter based on having enough matches, also establish the upper and lower bounds on which you'd like to match*/ SELECT * ,COUNT(1) OVER (PARTITION BY FirstName, LastName) AS Count ,id - 50 AS lowerBound ,id + 50 AS upperBound FROM People WHERE People.filteringcriteria = 'Y' ) as P2 INNER JOIN People AS P1 ON
p1.id >= P2.lowerBound AND
P1.id <= P2.upperBound AND
P2.id /*This ensures each pair of ids only shows up together once.*/ AND P1.firstname = P2.firstname AND P1.lastname = P2.lastname WHERE P2.Count > 1 ORDER BY