question

red68 avatar image
red68 asked

Fastest way to find alphanumeric data for each column

If I run the following query via a cursor, it runs for a while to produce the results on a table with 1 million records and 178 columns. The table name is passed in and spaces were causing an issue so I am using a function to replace those as well. I just want a count for each column that have alphanum only characters. (i.e., 123a or 1ab34, a12, etc.) NOT 123-a or any with special symbols. Thanks! --Alpha_Numeric (Select count(*) from ' + @TableName + ' as Alpha_Numeric WHERE dbo.udfreplacemultiplespaces(replace(' + @ColumnName + ','' '','''')) not like ''%[^0-9a-z]%'' and dbo.udfreplacemultiplespaces(replace(' + @ColumnName + ','' '','''')) like ''%[a-z]%'' and dbo.udfreplacemultiplespaces(replace(' + @ColumnName + ','' '','''')) like ''%[0-9]%'')
sql-server-2012
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
This little script is probably quicker than a cursor. Though there are a few things which will make it not quite as quick anyway: - The LIKE comparison with %something% will cause a scan operation. But since you are actually looking at all columns on all rows you'll most likely cause a table scan anyway, with or without indexes. - The UDF might run three times per column/row, meaning it might be called 178*1000000*3 times in this query. That is because scalar UDFs are black boxes with a zero cost as far as the optimizer is concerned. Best case it will run 178*1000000 times only, but don't be surprised if you find it to be applied three times per column. Anyway: I use a dynamic SQL approach to create a CTE with a case expression for each column - giving it a value of 0 or 1 depending on output of the LIKE-comparison, and then I call the CTE and sums up the columns from the CTE. You shall be REALLY careful about how you set the variable @tablename. If you don't check it carefully you might open up your solution to SQL injection. The script: declare @tablename sysname=N'dbo.mytable'; declare @s nvarchar(max)='SELECT '; SELECT @s = @s + ' CASE WHEN dbo.udfreplacemultiplespaces(replace(' + quotename(c.name) + ','' '','''')) NOT LIKE ''%[^0-9a-z]%'' AND dbo.udfreplacemultiplespaces(replace(' + quotename(c.name) + ','' '','''')) LIKE ''%[0-9]%'' AND dbo.udfreplacemultiplespaces(replace(' + quotename(c.name) + ','' '','''')) LIKE ''%[a-z]%'' THEN 1 ELSE 0 END AS ' + quotename(c.name) + ',' FROM sys.columns c WHERE c.object_id=object_id(@tablename) SET @s=LEFT(@s,LEN(@s)-1); SET @s=@s + ' FROM ' + @tablename; SET @s = 'WITH CTE AS( ' + @s + ') SELECT '; SELECT @s = @s + ' SUM(' + quotename(c.name) + ') as ' + quotename(c.name) + ',' FROM sys.columns as c where c.object_id=object_id(@tablename); SET @s=LEFT(@s,LEN(@s)-1); SET @s=@s + ' FROM CTE;' exec sys.sp_executesql @statement=@s; You probably want to alter the WHERE-clauses which selects from sys.columns so that they only selects columns with data types you can use the LIKE '%[a-z]%' etc on. The LIKE comparison might or might not work for certain language letters, depending on the collation of the columns. Eg the Swedish characters å, ä and å sort alphabetically last in the Swedish alphabet, but with a Latin1 collation they are sorted together with A and O respectively. So: You're not quite home yet :)
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

This works but still runs for a while on 1MM rows. Still researching fastest method. Thanks!
0 Likes 0 ·
Such search won't ever be fast. Whichever method will cause massive scans and TempDB usage. Even with indexes on each column, I'm pretty sure a table scan will be the fastest method. And TempDB is just my assumption, but since any method you use will have to gather, for each column on every row, if it meets the condition and then aggregate that. A free lunch would be to rewrite your scalar user defined function to an inline table valued function which is cross applied, thus revealing to the optimizer what the function does so it can be optimized together with the Query instead of being called once (or three times) per row and column.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.