x

The ‘SSN matching’ SQL Problem

After we have fixed the performance problems with the codes written by Robert Bar and his brother Reggie Bar, now has the time come to their cousin, Fu Bar.

She was facing a not-so-simple problem by matching a list of user-entered SSN's (from a web page) with a list of validated SSN's. So far so good, right, but as we know, users aren't that particular careful when entering their personal information, so the SSN's have a number of errors in them.
The good thing is that all user-entered SSN's are checksum-validated and have duplicates removed.
All files (setup, sample data population and Fu Bar's code) can be downloaded from here. Fu's code is updated 10am Sunday 16th of May 2010 UTC. If you have downloaded the code before this time, please re-download.

The rules are as follow (in addition to the standard rules here)

1) A perfect match is valid
2) One single digit wrong is valid
3) Two digits wrong are valid, as long as they are not adjacent
4) Three or more digits wrong is invalid
5) If a user-entered SSN has a perfect match with a validated SSN, this user-entered SSN cannot be used in a combination with another validated SSN.
6) You are not allowed to change the schema of the two tables, with indexes as the only exception.

To clarify the non-adjacent rule:
a) A digit wrong in an odd position is allowed with a digit wrong on another odd position.
b) A digit wrong in an even position is allowed with a digit wrong on another even position.
c) A digit wrong in an odd position is not allowed with a digit wrong for any even position. This is because there is one flaw in the checksum algorithm used for the user-entered SSN's.
d) For very same reason as "c)", a digit wrong in an even position is not allowed with a digit wrong for any odd position.

Your task is to rewrite Fu's code (her code takes about 70-75 minutes to run) to produce a list in following style:

vSSN, ueSSN, Status

vSSN is the validated SSN from dbo.Validated table.
ueSSN is the user-entered SSN from the dbo.UserEntered table.
Status column is a tinyint describing how many characters that are wrong. 0 is a perfect match, 1 is one digit wrong match and 2 is a two digit wrong match.

The sorting is vital, because helpdesk will receive this output later for manual correction so an output with order by vSSN, ueSSN and Status is mandatory.

All entries have to be submitted before 9pm Sunday 30th of May 2010 (UTC). Entries will be evaluated as they are submitted and time allows. Final winner will be announced shortly after that.

First Prize for SQL Speed Phreak Challenge #5: $100 and a license for SQL Data Generator (used when creating the sample data) !

There will also be three categorized runner-up prizes:

* Best Cursor-based solution
* Best CLR solution
* Best Unusual/experimental attempt

The winners of these prizes will get a licensed copy of either SQL Prompt or SQL Refactor, whichever they prefer.

Don't forget to have fun and to be creative,

//Peso

Phil Factor:

The SSN problem was one of the most enjoyable to take part in, because the problem as so simple to define, but the answer was tricky to get right. In the end, all the solutions except for cursor-based ones, were performing in under a second, which made the final order rather more difficult to determine.

When differences are in tens of milliseconds, you can get a different order on different machines: This is a problem but when the size of the test data was determined, the existing cursor-based solution was running in fourteen minutes. We never predicted the extent of the Speed Phreakery we could achieve. The first set-based SQL solutions had this down to less than a handful of seconds, and soon the solutions were fighting in around the half-second mark.

The CLR-based solutions probably were at a slight advantage with this problem though it would be nice to see if this is true of different sizes of data. Matt and JAhlen were neck and neck with this, though JAhlen was a node ahead over the finishing line.

Daniel’s SSIS solution was a sober warning that one should never neglect the obvious answers. It gets the award for the Best Unusual/experimental attempt.

Matt was the clear winner of the best SQL solution (OK, Fu Bar was banned from the best Cursor-based solution). I hope he will want the license for Prompt/Refactor, (excellent for quickly formatting code) though he writes his own excellent SQL tools which are well worth a look.

Winner of the First Prize for the SQL Speed Phreak Challenge #5: $100 and a license for SQL Data Generator (used when creating the sample data) was Johan Åhlén (JAhlen).

It was a very difficult decision as to whether Daniel or Johan won since it was a photofinish. We asked various experts, but nobody would venture an opinion. (The actual timings will vary considerably on different servers, be warned. The amount of memory, no. Of CPUs and general machine configuration will make a difference). I almost had to toss a coin!

Kathi Kellenberger, who wrote the excellent http://www.simple-talk.com/sql/performance/set-based-speed-phreakery-the-fifo-stock-inventory-sql-problem/ Set-based Speed Phreakery: The FIFO Stock Inventory SQL Problem and http://www.simple-talk.com/sql/performance/writing-efficient-sql-set-based-speed-phreakery/ Writing Efficient SQL: Set-Based Speed Phreakery, is studying the results of this competition with the intention of writing it up for Simple-Talk like the others.

Many thanks to all. Peso for all the hard work, and for constructing such an interesting problem. Matt for his energy in coming up with both CLR and TSQL solutions, Scott for kicking things off, Fu Bar for trying her best, Daniel for thinking laterally, Johan for showing the way with the CLR routines, and all those whose comments encouraged the participants to greater efforts.

[1]: http://www.simple-talk.com/sql/performance/set-based-speed-phreakery-the-fifo-stock-inventory-sql-problem/
more ▼

asked May 15, 2010 at 06:15 AM in Default

Peso gravatar image

Peso
1.6k 5 6 8

Hi Peso,

My question is with #5. I wanted to confirm that you DO want to allow the inverse of this: basically if a user-entered SSN matches perfectly to a validated SSN this validated SSN can be used with another user-entered SSN. Eg

select * from Validated v join UserEntered ue on ue.SSN = v.SSN where v.SSN='545013815'

has a perfect match but is also returned by Fu for a user that has entered it incorrectly: 545515815

Thx, have a great weekend
May 16, 2010 at 12:18 AM Scot Hauder

It's correct. The vSSN "545013815" has a perfect match with a user-entered SSN "545013815" and as such, the user-entered SSN "545013815" cannot be used for any other match. However, there can be more user-entered SSN's which can match the validated SSN "545013815", and it does with the user-entered SSN "545515815". Also make sure you have downloaded the latest code for Fu. I discovered she didn't comply to all the rules. Shame on her! :-) Fu's code was re-uploaded for an hour ago.

//Peso
May 16, 2010 at 05:28 AM Peso
Thanks Peso, I think that was it
May 16, 2010 at 05:49 AM Scot Hauder
I didn't think Fu modified the script that much but it went from 7min to 45min
May 16, 2010 at 11:29 AM Scot Hauder
That's the beauty of OR vs AND! :-)
May 16, 2010 at 11:41 AM Peso
(comments are locked)
10|1200 characters needed characters left

25 answers: sort voted first

2010.5.16 Scot Hauder v1

I'm titling this one: "Compute Scalar" :)

 SELECT val.SSN [vSSN], uen.SSN [ueSSN], CASE WHEN SUBSTRING(val.SSN,1,1) = SUBSTRING(uen.SSN,1,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,2,1) = SUBSTRING(uen.SSN,2,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,3,1) = SUBSTRING(uen.SSN,3,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,4,1) = SUBSTRING(uen.SSN,4,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,5,1) = SUBSTRING(uen.SSN,5,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,6,1) = SUBSTRING(uen.SSN,6,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,7,1) = SUBSTRING(uen.SSN,7,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,8,1) = SUBSTRING(uen.SSN,8,1) THEN 0 ELSE 1 END + CASE WHEN SUBSTRING(val.SSN,9,1) = SUBSTRING(uen.SSN,9,1) THEN 0 ELSE 1 END [Status]

FROM ( SELECT v.[SSN], x.SSN2 FROM [dbo].[Validated] v

CROSS APPLY (SELECT SUBSTRING(v.SSN,2,1) + SUBSTRING(v.SSN,4,1) + SPACE(2) + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) [SSN2] UNION ALL SELECT SUBSTRING(v.SSN,2,1) + SPACE(1) + SUBSTRING(v.SSN,6,1) + SPACE(1) + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL SELECT SUBSTRING(v.SSN,2,1) + SPACE(2) + SUBSTRING(v.SSN,8,1) + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL SELECT SPACE(1) + SUBSTRING(v.SSN,4,1) + SUBSTRING(v.SSN,6,1) + SPACE(1) + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL SELECT SPACE(1) + SUBSTRING(v.SSN,4,1) + SPACE(1) + SUBSTRING(v.SSN,8,1) + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL SELECT SPACE(2) + SUBSTRING(v.SSN,6,1) + SUBSTRING(v.SSN,8,1) + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL

           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SPACE(2) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SPACE(1) + SUBSTRING(v.SSN,3,1) + SPACE(1) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SPACE(1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SPACE(1) + SUBSTRING(v.SSN,9,1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SPACE(1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SPACE(1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SUBSTRING(v.SSN,1,1) + SPACE(2) + SUBSTRING(v.SSN,7,1) + SUBSTRING(v.SSN,9,1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SUBSTRING(v.SSN,1,1) + SPACE(1) + SUBSTRING(v.SSN,5,1) + SPACE(1) + SUBSTRING(v.SSN,9,1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SUBSTRING(v.SSN,1,1) + SPACE(1) + SUBSTRING(v.SSN,5,1) + SUBSTRING(v.SSN,7,1) + SPACE(1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SPACE(2) + SUBSTRING(v.SSN,9,1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SPACE(1) + SUBSTRING(v.SSN,7,1) + SPACE(1) UNION ALL
           SELECT SUBSTRING(v.SSN,2,1) 
                + SUBSTRING(v.SSN,4,1)
                + SUBSTRING(v.SSN,6,1)
                + SUBSTRING(v.SSN,8,1)
                + SUBSTRING(v.SSN,1,1) + SUBSTRING(v.SSN,3,1) + SUBSTRING(v.SSN,5,1) + SPACE(2))x

)val JOIN (SELECT ue.[SSN], y.SSN2 FROM [dbo].[UserEntered] ue LEFT JOIN [dbo].[Validated] v ON (v.SSN = ue.SSN) CROSS APPLY (SELECT SUBSTRING(ue.SSN,2,1) + SUBSTRING(ue.SSN,4,1) + SPACE(2) + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) [SSN2] UNION ALL SELECT SUBSTRING(ue.SSN,2,1) + SPACE(1) + SUBSTRING(ue.SSN,6,1) + SPACE(1) + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL SELECT SUBSTRING(ue.SSN,2,1) + SPACE(2) + SUBSTRING(ue.SSN,8,1) + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL SELECT SPACE(1) + SUBSTRING(ue.SSN,4,1) + SUBSTRING(ue.SSN,6,1) + SPACE(1) + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL SELECT SPACE(1) + SUBSTRING(ue.SSN,4,1) + SPACE(1) + SUBSTRING(ue.SSN,8,1) + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL SELECT SPACE(2) + SUBSTRING(ue.SSN,6,1) + SUBSTRING(ue.SSN,8,1) + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL

           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SPACE(2) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SPACE(1) + SUBSTRING(ue.SSN,3,1) + SPACE(1) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SPACE(1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SPACE(1) + SUBSTRING(ue.SSN,9,1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SPACE(1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SPACE(1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SUBSTRING(ue.SSN,1,1) + SPACE(2) + SUBSTRING(ue.SSN,7,1) + SUBSTRING(ue.SSN,9,1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SUBSTRING(ue.SSN,1,1) + SPACE(1) + SUBSTRING(ue.SSN,5,1) + SPACE(1) + SUBSTRING(ue.SSN,9,1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SUBSTRING(ue.SSN,1,1) + SPACE(1) + SUBSTRING(ue.SSN,5,1) + SUBSTRING(ue.SSN,7,1) + SPACE(1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SPACE(2) + SUBSTRING(ue.SSN,9,1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SPACE(1) + SUBSTRING(ue.SSN,7,1) + SPACE(1) UNION ALL
           SELECT SUBSTRING(ue.SSN,2,1) 
                + SUBSTRING(ue.SSN,4,1)
                + SUBSTRING(ue.SSN,6,1)
                + SUBSTRING(ue.SSN,8,1)
                + SUBSTRING(ue.SSN,1,1) + SUBSTRING(ue.SSN,3,1) + SUBSTRING(ue.SSN,5,1) + SPACE(2))y
WHERE v.SSN IS NULL)uen ON (val.SSN2 = uen.SSN2) GROUP BY Val.SSN,uen.SSN UNION ALL SELECT v.[SSN] , ue.SSN , 0 FROM [dbo].[Validated] v JOIN [dbo].[UserEntered] ue ON (ue.SSN = v.SSN) ORDER BY [vSSN], [ueSSN], [status]
more ▼

answered May 16, 2010 at 10:47 AM

Scot Hauder gravatar image

Scot Hauder
6.1k 13 15 18

(comments are locked)
10|1200 characters needed characters left

--Phil Factor 1A (I'll call this one the 'remove the hopeless and the obvious User-SSNS before you start'

--build up the DDL --clear up if it crashed out with an error if exists (select 1 from tempdb.information_schema.tables  where Table_Name like '#ValidSSN%')  DROP TABLE #validSSN if exists (select 1 from tempdb.information_schema.tables  where Table_Name like '#UserSSN%')  DROP TABLE #UserSSN if exists (select 1 from tempdb.information_schema.tables  where Table_Name like '#CheckedSSN%')  DROP TABLE #CheckedSSN

/*we'll go for a temporary table since we need to exclude perfect matches from ambiguous matching. We may want to use it to remove duplicates if we are doing 'best matching'.*/ CREATE TABLE #CheckedSSN ( vSSN CHAR(9) NOT NULL, ueSSN CHAR(9) NOT NULL, [Status] TINYINT NOT NULL ) --we put odd and even digits in separate strings for valid SSNs CREATE TABLE #ValidSSN ( Valid_ID INT IDENTITY(1, 1), SSN CHAR(9) NOT NULL primary key, evens AS (SUBSTRING(SSN, 2, 1)+SUBSTRING(SSN, 4, 1)+SUBSTRING(SSN, 6, 1) +SUBSTRING(SSN, 8, 1)) PERSISTED, odds AS (SUBSTRING(SSN, 1, 1)+SUBSTRING(SSN, 3, 1)+SUBSTRING(SSN, 5, 1) +SUBSTRING(SSN, 7, 1)+SUBSTRING(SSN, 9, 1)) PERSISTED )

--we put odd and even digits in separate strings for the ones the users enter CREATE TABLE #UserSSN ( UserSSN_ID INT IDENTITY(1, 1) primary key, SSN CHAR(9) NOT NULL, evens AS (SUBSTRING(SSN, 2, 1)+SUBSTRING(SSN, 4, 1)+SUBSTRING(SSN, 6, 1) +SUBSTRING(SSN, 8, 1)) PERSISTED, odds AS (SUBSTRING(SSN, 1, 1)+SUBSTRING(SSN, 3, 1)+SUBSTRING(SSN, 5, 1) +SUBSTRING(SSN, 7, 1)+SUBSTRING(SSN, 9, 1)) PERSISTED )

--do some work

--so we can slice up the strings and then index them INSERT INTO #ValidSSN (SSN) SELECT [SSN] FROM Validated

--we can remove the 0-error matches INSERT INTO #CheckedSSN (vSSN, ueSSN, Status) SELECT Validated.SSN, UserEntered.SSN, 0 FROM UserEntered INNER JOIN Validated ON Validated.SSN=UserEntered.SSN

--and we need only waste time on the dud matches INSERT INTO #UserSSN (SSN) SELECT UserEntered.SSN FROM UserEntered LEFT OUTER JOIN Validated ON Validated.SSN=UserEntered.SSN WHERE Validated.SSN IS NULL--i.e. where we didnt get a match

CREATE INDEX idxValidEvenAndOdd ON #ValidSSN(evens,Odds) CREATE INDEX idxUsersEvenAndOdd ON #UserSSN(evens,Odds)

--pop in the 1-error matches where it was an error on the odd side only INSERT INTO #CheckedSSN (vSSN, ueSSN, Status) SELECT #ValidSSN.SSN, #UserSSN.SSN, 1 FROM #ValidSSN INNER JOIN #UserSSN ON #ValidSSN.evens=#UserSSN.evens WHERE CASE WHEN SUBSTRING(#ValidSSN.odds,1,1) =SUBSTRING(#UserSSN.odds,1,1) THEN 1 ELSE 0 END+
CASE WHEN SUBSTRING(#ValidSSN.odds,2,1) =SUBSTRING(#UserSSN.odds,2,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.odds,3,1) =SUBSTRING(#UserSSN.odds,3,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.odds,4,1) =SUBSTRING(#UserSSN.odds,4,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.odds,5,1) =SUBSTRING(#UserSSN.odds,5,1) THEN 1 ELSE 0 END =4 --pop in the 1-error matches where it was an error on the even side only INSERT INTO #CheckedSSN (vSSN, ueSSN, Status) SELECT #ValidSSN.SSN, #UserSSN.SSN, 1 FROM #ValidSSN INNER JOIN #UserSSN ON #ValidSSN.Odds=#UserSSN.Odds WHERE CASE WHEN SUBSTRING(#ValidSSN.evens,1,1) =SUBSTRING(#UserSSN.evens,1,1) THEN 1 ELSE 0 END+
CASE WHEN SUBSTRING(#ValidSSN.evens,2,1) =SUBSTRING(#UserSSN.evens,2,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.evens,3,1) =SUBSTRING(#UserSSN.evens,3,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.evens,4,1) =SUBSTRING(#UserSSN.evens,4,1) THEN 1 ELSE 0 END =3
--pop in the 2-error matches (where no perfect match exists) -- it was an error on the odd side only INSERT INTO #CheckedSSN (vSSN, ueSSN, Status) SELECT #ValidSSN.SSN, #UserSSN.SSN, 2 FROM #ValidSSN INNER JOIN #UserSSN ON #ValidSSN.evens=#UserSSN.evens left outer join #CheckedSSN on #UserSSN.SSN =#CheckedSSN.ueSSN and status=0 WHERE CASE WHEN SUBSTRING(#ValidSSN.odds,1,1) =SUBSTRING(#UserSSN.odds,1,1) THEN 1 ELSE 0 END+
CASE WHEN SUBSTRING(#ValidSSN.odds,2,1) =SUBSTRING(#UserSSN.odds,2,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.odds,3,1) =SUBSTRING(#UserSSN.odds,3,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.odds,4,1) =SUBSTRING(#UserSSN.odds,4,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.odds,5,1) =SUBSTRING(#UserSSN.odds,5,1) THEN 1 ELSE 0 END =3 and #CheckedSSN.ueSSN is null --pop in the 2-error matches (where no perfect match exists) --where it was an error on the even side only INSERT INTO #CheckedSSN (vSSN, ueSSN, Status) SELECT #ValidSSN.SSN, #UserSSN.SSN, 2 FROM #ValidSSN INNER JOIN #UserSSN ON #ValidSSN.Odds=#UserSSN.Odds left outer join #CheckedSSN on #UserSSN.SSN =#CheckedSSN.ueSSN and status=0 --the outer join is to eliminate those that had perfect matched
WHERE CASE WHEN SUBSTRING(#ValidSSN.evens,1,1) =SUBSTRING(#UserSSN.evens,1,1) THEN 1 ELSE 0 END+
CASE WHEN SUBSTRING(#ValidSSN.evens,2,1) =SUBSTRING(#UserSSN.evens,2,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.evens,3,1) =SUBSTRING(#UserSSN.evens,3,1) THEN 1 ELSE 0 END+ CASE WHEN SUBSTRING(#ValidSSN.evens,4,1) =SUBSTRING(#UserSSN.evens,4,1) THEN 1 ELSE 0 END =2
and #CheckedSSN.ueSSN is null --print out the results Select vSSN, ueSSN, [Status] from #checkedSSN order by vSSN, ueSSN, Status ---clean up DROP TABLE #validSSN DROP TABLE #UserSSN DROP TABLE #CheckedSSN
more ▼

answered May 17, 2010 at 05:53 AM

Phil Factor gravatar image

Phil Factor
3.9k 8 9 16

Good job Phil, I also am working on a solution that reduces the number of effective rows acted upon. With only 8K rows returned we should see some sub-second entries soon
May 17, 2010 at 02:25 PM Scot Hauder
The trick here is that any SSNs that have any hope of matching must have a complete match on either a string made up of the odd characters, or one of the even characters. You can safely junk the rest. This SQL is over-complicated because I'd wrongly assumed that we were doing a 'best-match' algorithm, and I altered it at the last minute. When I get a moment, I'll tidy it up.
May 18, 2010 at 05:06 AM Phil Factor
(comments are locked)
10|1200 characters needed characters left

Current standing

+-----------------------+-----------+----------+-----------+--------+--------+ | Name | Time (ms) | Position | Reads | Result | Type | +-----------------------+-----------+----------+-----------+--------+--------+ | JAhlen v4 | 320 | 1 | 231 | OK | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | Daniel Ross | 350 | 2 | 206 | OK | SSIS | +-----------------------+-----------+----------+-----------+--------+--------+ | JAhlen v3 | 370 | | 231 | Error | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | Matt v2 | 391 | 3 | 231 | OK | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | Matt v3 | 426 | 4 | 237 | OK | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | Matt v1 | 538 | 5 | 1 422 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Phil v1e | 582 | | 1 994 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Matt v1 | 590 | 6 | 237 | OK | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | JAhlen v2 | 652 | 7 | 462 | OK | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | Phil v1c | 674 | | 1 537 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Phil v1b | 700 | | 1 609 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Peso v1 | 727 | | 55 701 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | JAhlen v1 | 776 | 8 | 462 | OK | SQLCLR | +-----------------------+-----------+----------+-----------+--------+--------+ | Peso 2 | 819 | | 154 153 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Steve 1a | 887 | 9 | 98 500 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Phil v1d | 934 | | 98 500 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Matt v2 | 946 | 10 | 55 755 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Phil Factor v1 | 1 900 | | 5 362 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Scot Hauder v1 | 3 122 | 11 | 2 464 | OK | T-SQL | +-----------------------+-----------+----------+-----------+--------+--------+ | Fu Bar v1 | 4 467 583 | | 1 484 180 | OK | CURSOR | +-----------------------+-----------+----------+-----------+--------+--------+ 
more ▼

answered May 16, 2010 at 12:02 PM

Peso gravatar image

Peso
1.6k 5 6 8

I wonder if we ought to put Foo's timing in!
May 18, 2010 at 07:50 AM Phil Factor
Peso - are you doing this in the same way as the harness (x runs with cache drops before the first run, and then taking the median timing)
May 18, 2010 at 10:50 AM Matt Whitfield ♦♦
Somewhat. I clear the cache and run the solution 10 times and take the minimum values from SQL Profiler.
May 18, 2010 at 11:36 AM Peso
Shouldn't we use Matt's test harness?
May 18, 2010 at 01:05 PM Phil Factor
I think SQLCLR are a great addition to the challenge, but other people can't learn from them as they can with posted T-SQL suggestions. What if we agree to post the source for the SQLCLR to be a valid suggestion?
May 20, 2010 at 06:53 PM Peso
(comments are locked)
10|1200 characters needed characters left
more ▼

answered May 18, 2010 at 09:10 AM

JAhlen gravatar image

JAhlen
41 1 2 3

Vad kul att se dig här!
May 18, 2010 at 11:31 AM Peso
And for you non-swedish speakers; "Great to see you here!".
May 18, 2010 at 11:32 AM Peso
I get this error message trying to run the SQLCLR. "Msg 6532, Level 16, State 49, Line 1 .NET Framework execution was aborted by escalation policy because of out of memory. System.Threading.ThreadAbortException: Thread was being aborted. System.Threading.ThreadAbortException: at System.Collections.BitArray..ctor(Int32 length, Boolean defaultValue) at UserDefinedFunctions.Match()"
May 18, 2010 at 11:41 AM Peso
Om inte någon annan skriver ett snabbare CLR lösning, är du dömd att få pris för bästa CLR-baserad lösning. Välkommen till tävlingen, och tack för att komma in.
May 18, 2010 at 01:04 PM Phil Factor
Thanks. My SQL CLR solution works on my laptop (64-bit SQL Server 2008 R2 RTM with default settings) and takes approx 2 300 ms. If necessary I can make some changes so it takes less memory (but will be a bit slower).
May 18, 2010 at 04:12 PM JAhlen
(comments are locked)
10|1200 characters needed characters left
more ▼

answered May 18, 2010 at 04:25 PM

JAhlen gravatar image

JAhlen
41 1 2 3

This is 100 ms faster than v1.
May 18, 2010 at 06:20 PM Peso
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x991
x348
x8
x7
x5

asked: May 15, 2010 at 06:15 AM

Seen: 8410 times

Last Updated: Jun 29, 2010 at 12:00 PM