question

jhowe avatar image
jhowe asked

SQL credit card search refactor

hi all i have the following script which i have created. It uses a cursor to loop around tables/columns and matches fields against a regex pattern. The problem is it is slow and clunky. I'm looking for a better solution so i thought i'd ask the experts! Any ideas would be greatly appreciated. -- Configure the server to permit the OLE Automation Regex sp_configure 'show advanced options', 1; GO RECONFIGURE; GO sp_configure 'Ole Automation Procedures', 1; GO RECONFIGURE; GO -- Create the Regex Function IF OBJECT_ID(N'dbo.OARegexMatch') IS NOT NULL DROP FUNCTION dbo.OARegexMatch GO CREATE FUNCTION dbo.OARegexMatch /* very simple Function Wrapper around the call */ ( @objRegexExp INT , @matchstring VARCHAR(MAX) ) RETURNS INT AS BEGIN DECLARE @objErrorObject INT , @hr INT , @match BIT EXEC @hr= sp_OAMethod @objRegexExp, 'Test', @match OUT, @matchstring IF @hr != 0 BEGIN RETURN NULL END RETURN @match END GO DECLARE csrTables CURSOR FOR SELECT name , object_id FROM sys.tables ORDER BY name OPEN csrTables; DECLARE @TableName NVARCHAR(100) , @ObjectId INT FETCH NEXT FROM csrTables INTO @TableName, @ObjectId WHILE @@FETCH_STATUS = 0 BEGIN -- Do Stuff. PRINT @TableName BEGIN --====== BEGIN INNER FIELD NAME CURSOR DECLARE csrFields CURSOR FOR SELECT name FROM sys.columns WHERE object_id = @ObjectId AND precision = 0 AND max_length >= 16 OPEN csrFields; DECLARE @FieldName AS NVARCHAR(50) FETCH NEXT FROM csrFields INTO @FieldName WHILE @@FETCH_STATUS = 0 BEGIN -- Do Stuff. PRINT '-- ' + @FieldName DECLARE @sql NVARCHAR(MAX) SET @sql = ' declare @objRegexExp int , @objErrorObject int , @strErrorMessage varchar(255) , @hr int , @match bit , @field nvarchar(50)' + ' select @strErrorMessage = ' + '''creating a regex object''' + ' exec @hr= sp_OACreate ''VBScript.RegExp'', @objRegexExp out ' + ' if @hr = 0 exec @hr= sp_OASetProperty @objRegexExp, ''pattern'', ''^[3|4|5|6]([0-9]{15}$|[0-9]{12}$|[0-9]{13}$|[0-9]{14}$)''' + ' if @hr = 0 exec @hr= sp_OASetProperty @objRegexExp, ''IgnoreCase'', 1 --Doing a Test' + ' if @hr = 0 select * from CRM.dbo.' + QUOTENAME(@TableName) + ' where dbo.OARegexMatch(@objRegexExp, ' + QUOTENAME(@FieldName) + ') = 1' --PRINT @sql EXEC sp_executesql @sql FETCH NEXT FROM csrFields INTO @FieldName END CLOSE csrFields DEALLOCATE csrFields --====== END INNER FIELD NAME CURSOR END FETCH NEXT FROM csrTables INTO @TableName, @ObjectId END CLOSE csrTables DEALLOCATE csrTables go sp_configure 'show advanced options', 1; GO RECONFIGURE; GO sp_configure 'Ole Automation Procedures', 0; GO RECONFIGURE; GO *** UPDATE EQUIVALENT CODE USING LIKE BELOW TO COMPARE BOTH FOR PERFORMANCE *** DECLARE csrTables CURSOR FOR SELECT name , object_id FROM sys.tables ORDER BY name OPEN csrTables; DECLARE @TableName NVARCHAR(100) , @ObjectId INT FETCH NEXT FROM csrTables INTO @TableName, @ObjectId WHILE @@FETCH_STATUS = 0 BEGIN -- Do Stuff. PRINT @TableName BEGIN --====== BEGIN INNER FIELD NAME CURSOR DECLARE csrFields CURSOR FOR SELECT name FROM sys.columns WHERE object_id = @ObjectId AND precision = 0 AND max_length >= 16 OPEN csrFields; DECLARE @FieldName AS NVARCHAR(50) FETCH NEXT FROM csrFields INTO @FieldName WHILE @@FETCH_STATUS = 0 DECLARE @SQL NVARCHAR(max) BEGIN -- Do Stuff. PRINT '-- ' + @FieldName SET @SQL = 'SELECT * FROM ' + QUOTENAME(@TableName) + 'WHERE ' + QUOTENAME(@FieldName) + ' + LIKE ''%[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'' OR ' + QUOTENAME(@FieldName) + ' + LIKE ''%[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'' OR ' + QUOTENAME(@FieldName) + ' + LIKE ''%[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'' OR ' + QUOTENAME(@FieldName) + ' + LIKE ''%[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'' OR ' + QUOTENAME(@FieldName) + ' + LIKE ''%[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'' OR ' PRINT @SQL EXEC sys.sp_executesql @SQL FETCH NEXT FROM csrFields INTO @FieldName END CLOSE csrFields DEALLOCATE csrFields --====== END INNER FIELD NAME CURSOR END FETCH NEXT FROM csrTables INTO @TableName, @ObjectId END CLOSE csrTables DEALLOCATE csrTables go
sql-server-2008tsqlclrregex
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
If you want to run a query against every column in every table... you're kind of stuck. There's no magic method I'm aware of that will do that for you. What is this thing accomplishing? Why would you need to run it against everything?
0 Likes 0 ·
jhowe avatar image jhowe commented ·
we have to go through a security audit and i want to find card numbers in places where they shouldn't be...
0 Likes 0 ·
jhowe avatar image jhowe commented ·
what about breaking it up somehow... to lessen the impact maybe only look at first digit, stick those into a temporary table, then analyse from there? do you think that would improve performance and would it be possible?
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered
I don't know, I think you're kind of stuck. A couple of ideas come to mind, but I don't think either is workable. 1. Export every column into a new database, a table that consists of three columns, tablename, columnname, value. Then you can just run the search on the value. You could even optimize this. But, you have to run the export and I've no idea if that will be faster than just running on everything. 2. Make the database readonly and put a columnstore index on every column. Then run the query. Again, not sure how viable that is, and you have to have SQL Server 2012. One suggestion that will help a tiny bit, for regex style code, CLR generally works better within SQL Server than most other methods. So you could improve that, but that's probably not the bottleneck in this process.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

DavidBetteridge avatar image
DavidBetteridge answered
Could you use LIKE instead of the regex. You would need to do 4 different tests, one for 15 digits, one for 12 digits etc. if '3123456789012345' LIKE '[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' PRINT 'Match'
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jhowe avatar image jhowe commented ·
I have refactored my regex expression to [3|4|5|6]([0-9]{12,16}$) this will look at any number starting with 3 or 4,5,6 return 0-9 digits between the length of 12-16 digits. Any confirmation that i am correct would be greatly appreciated! I think this is the best performing option without going down the route Grant Fritchey has specified. Not sure how to get the regex to display properly!
0 Likes 0 ·
DavidBetteridge avatar image DavidBetteridge commented ·
My suggestion was not to use regular expressions at all, but just stick with sql server's like statement. If however you want to try the regular expression / clr route then there used to be an example on MSDN.
0 Likes 0 ·
jhowe avatar image jhowe commented ·
hi i just wanted someone to check my regex expression is correct. It seems to be working, i don't think there is really a fast solution, i can just let the script run overnight on each server.
0 Likes 0 ·
jhowe avatar image jhowe commented ·
@DavidBetteridge what about variable length? card numbers are 12-16 digits do i have to run separate statements or can i do something like {12,16} i.e. between 12-16 digits like i can in regex?
0 Likes 0 ·
DavidBetteridge avatar image
DavidBetteridge answered
You would have to do seperate statements.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jhowe avatar image jhowe commented ·
@DavidBetteridge so far this performs much faster than my script but what if the numbers are surrounded by text do i have to add wildcard i.e. % to either side of the like? i.e. LIKE '%[3|4|5|6][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'
0 Likes 0 ·
DavidBetteridge avatar image
DavidBetteridge answered
correct, you will need to add %% around them. If that is what you want to do, then you only need to check for strings with 12 digits in. (A string with 16 digits is also a string with 12 digits). Does it matter if it matches a string with 17 digits?
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jhowe avatar image jhowe commented ·
@DavidBetteridge, @Grant Fritchey i only want to filter out 12-16 digit numbers, the numbers may be surrounded by text however if they were entered into a notes field for example. I have a feeling using LIKE isn't going to do what i want... I will post my refactored code using the like statements below my original answer. Would greatly appreciate feedback. The code using 'LIKE' doesn't seem to be working in a cursor at the moment... is there a mistake somewhere?
0 Likes 0 ·
DavidBetteridge avatar image
DavidBetteridge answered
create table TestCases (StringToTest varchar(100) not null) insert into TestCases select '12345' insert into TestCases select 'a12345' insert into TestCases select '12345b' insert into TestCases select '123456' insert into TestCases select 'a123456' insert into TestCases select '123456b' SELECT * FROM TestCases T WHERE T.stringToTest like '%[^0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%' /* 5 digits only in the string */ or T.stringToTest like '[0-9][0-9][0-9][0-9][0-9][^0-9]%' /* 5 digits only at the start of the string */ or T.stringToTest like '%[^0-9][0-9][0-9][0-9][0-9][0-9]' /* 5 digits only at the end of the string */ or T.stringToTest like '[0-9][0-9][0-9][0-9][0-9]' /* Entire string is 5 digits */ DROP TABLE TestCases
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.