question

graz avatar image
graz asked

How to hash SSN values

We're trying to find an acceptable way to clean up production SSNs for use outside production. I've been looking at one way hashing with salt. I've looked at the HashBytes and Checksum functions. I can use these to generate a pretty good 4 byte hash of an SSN.

I'd really like to end up with a 9 digit number so the application actually works. Has anyone done anything like this? Any thoughts on how to accomplish this? It would need to be compliant with all the various privacy standards.

My data set is 100 million rows (20 million unique SSNs) and the SSN appears in multiple tables (yes, I know that's an issue). I'm starting to think that I may just need to build a temporary lookup table and generate 20 million or so random SSNs and then do updates to the original table. After that I can drop my original mapping table.

security
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

K. Brian Kelley avatar image
K. Brian Kelley answered

I would go with the temporary lookup table populated with fake values matched up with the real ones. Instead of doing just a random number generation, you could generate the 3 parts (area number, group number, and serial number) separately and combine them to form a 9 character string (I'm assuming that's how you're storing or you'd lose the leading zeroes). You could even keep track of the area number + group number and the current serial number in your SSN generation code so you can ensure there's no duplicates.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

chris 1 2 avatar image
chris 1 2 answered

Do you really care of the SSN's are valid everywhere? I mean, you want them all to be 9 digit numbers, but if you're not using them as keys then it's probably not a big deal right?

Otherwise, yeah, the solution you came up with in the 3rd paragraph is how I handle CC numbers: Create a bunch of random ones, link them together, hide the link table with security.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Peso avatar image
Peso answered

American SSN's doesn't have a checksum, right?

What about a simple

UPDATE x SET x.SSN = x.recID FROM ( SELECT SSN, DENSE_RANK() OVER (ORDER BY SSN) + 99999999 AS recID FROM Table1 ) AS x

The new numbers are random in nature even if they are sequential.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

esabine avatar image
esabine answered

I also work in an area where we have SSNs, Fed Tax IDs, and CC numbers. We also have the same challenge but we address it with security and not obfuscation.

If you can't approach this from an access rights issue, I would recommend a table that stores the unique SSNs and then an auto assigned surrogate key. If it must be 9 digits then just seed it with 100000000 but you'll need to teach the users that this number isn't a tax ID.

Now 20MM rows is a lot. For a smaller population I'd suggest using foreign keys with ON UPDATE CASCADE and then allow SQL Server to cascade the updates out after you update the master table but.....20MM rows is a lot. Sounds like an off-hours maintenance effort to work through the updating.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

mrdenny avatar image
mrdenny answered

You could use the output from the CHECKSUM() function. Just pad the results with 0s at either end. Multiple values can have the same checksum, so while it's possible to reverse engineer the values, you would have to get all the possible values (which would be possible no matter what technique you use as SSNs are only 9 characters in length). And this way the values will be the same across all the tables.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

PurpleLady avatar image
PurpleLady answered

You could update the "area number" (1st 3 digits) to be any number greater than 772 - this is the highest area code assigned by the SSA to date. It's not "valid", but it would work for what you need.

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

marlonribunal avatar image
marlonribunal answered

You may find this eBook from Simple Talk Publishing and Red Gate Software helpful in protecting your data. I haven't read the book so I can't really tell if it can help you find ways to solve your problem. But it should at least give you solid concepts on protecting your data.

Protecting SQL Server Data by John Magnabosco

10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.