I use a query as given below to find the possible columns on all tables on a database that may contain sensitive personal information: select a.TABLE_SCHEMA ,a.TABLE_NAME,a.COLUMN_NAME,b.TABLE_TYPE from INFORMATION_SCHEMA.COLUMNS a INNER JOIN INFORMATION_SCHEMA.TABLES b ON a.TABLE_CATALOG = b.TABLE_CATALOG AND a.TABLE_SCHEMA = b.TABLE_SCHEMA AND a.TABLE_NAME = b.TABLE_NAME AND b.TABLE_TYPE <> 'VIEW' AND a.COLUMN_NAME not in ('SSNID','SSN_ID') where a.COLUMN_NAME like '%SSN%' or a.COLUMN_NAME like '%BIRTH%' or a.COLUMN_NAME like '%RACE%' order by a.TABLE_SCHEMA desc,a.TABLE_NAME desc After which, I manually run a select query as given below against each column (depending on the column) to find if any unencrypted data is present: select * from . where ISNUMERIC(SSN) = 1 or SUBSTRING(SSN,1,1) not like '%[^A-Z]%' and SUBSTRING(SSN,5,4) not like '%[^0-9]%' select * from . where isdate(BIRTHDATE)=1 I have to automate the entire process and at the end have to receive a mail with the columns that have unencrypted data or row count > 0 for the above queries. What is the procedure that i need to adopt?
Personally, I'd look to see if I could something like this using PowerShell. While the queries are straight-forward as you've already laid them out, the only way to do this within T-SQL would probably be dynamic code within cursors.
My answer was, first of all encrypted data is formatted in hex. This eliminates a whole raft of coding problems and makes it very easy to identify and manage the encrypted data. Our data was encrypted by column using different modes depending on the column functionality. The most secure data was 3-DES encrypted with SHA-256 hashing on the passkeys. The benefit is that any encrypted data would show up as a HEX string which was easy to find since all characters fell into [0-9a-f] and the number of characters was always even.
First I agree with Grant. You are better of doing this outside of SQL. PowerShell is a fine option, so is Python. Essentially, you need to store the list of columns that is returned by your first query, and then iterate over it to dynamically generate your other queries and then execute them concatenating the rows into your report. I confronted a similar problem when I had to search for a specific value accross numerous tables regardless of what column it was in. I wrote up my solutions for
SQLServerCentral.com and while it is not what you want you could likely modify it easily to do what you want. It is at