question

jhowe avatar image
jhowe asked

remove unwanted characters from column

Hi all, I have a account number field that gets supplied to us from one of our partners. The problem is they've dumped (what i think) is carriage returns in this field as REPLACE(field, ' ', '') doesn't seem to be picking them up. How do i identify and remove carraige returns, tabs, line feeds, whitespace, any garbage that isn't text or numbers?
sql-server-2008sql-server-2008-r2replacecharacters
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered
Carriage returns (char(13)) line feeds (char(10)) and tabs (char(9)) can be easily replaced replace(replace(replace(field, char(9),''), char(10),''), char(13),'') but its the 'other garbage' that can be difficult to eradicate - you either have to know what it is up-front and deal with it, like above, or have a list of valid characters and try and trap for those... for example if only numbers where allowed where patindex(^[0-9], field) = 0
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
The replace you quoted in your question should remove blank spaces
1 Like 1 ·
jhowe avatar image jhowe commented ·
thanks kev and what about one or more blank spaces?
0 Likes 0 ·
jhowe avatar image jhowe commented ·
this has worked perfectly.
0 Likes 0 ·
Usman Butt avatar image
Usman Butt answered
Just for fun, I thought of handling it by using the cteTally which would return a string with valid characters only i.e. [0-9] and [A-Z] ([a-z] are handled by using the UPPER function, but can be handled the same way as [A-Z]) DECLARE @pstring VARCHAR(200) SELECT @pstring = 'ABC 456 # & 8 ' ; WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) , cteValidCharacters(CharValue) AS (--==== This provides the list of the valid Characters i.e [0-9] AND [A-Z] SELECT CHAR(47+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1 UNION ALL SELECT TOP 26 CHAR(64+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E2 ) , cteAllCharacters(Chars) AS ( SELECT SUBSTRING(@pString ,N,1) FROM cteTally WHERE UPPER(SUBSTRING(@pString ,N,1)) IN (SELECT [CharValue] FROM cteValidCharacters) ) SELECT (SELECT [Chars] [*] FROM cteAllCharacters FOR XML PATH('')) It can be turned into an inline function to scale up.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Pavel Pawlowski avatar image
Pavel Pawlowski answered
For whatever replacements and matching in case you can use CLR, the most siple thing is to use a CLR RegEx implementation. You can take a look on [SQL Server Regular Expressions – Replacements][1] for sample implementation. For your concrete needs you can use the `"(?:[^a-zA-Z0-9]|\s)"` regular expression which returns all chars, which are not alphanumeric or are white space. In the sample implementation you can simple replace by empty string and you are done. [1]: http://www.pawlowski.cz/2011/05/sql-server-regular-expressions-replacements/
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

dvroman avatar image
dvroman answered
We get what appears to be "unvalidated" data from some of our vendors. Our database is based on the requirements of MS-Dynamics which is much more restrictive. Sometimes we get strange characters in and I use the following script to identify the data. By the way, I wrote this for SQL-2K so there are probably better ways of doing this now.
DECLARE @TestString VARCHAR(30), @Bin VARBINARY(255) SELECT @TestString = RTRIM(SomeCharColumn) FROM DB..Table (NOLOCK) WHERE LEN(SomeCharColumn) = 4 AND SomeCharColumn LIKE '%-DT' -- Sometimes this won't print (normally happens when there is null data) PRINT @TestString SET @Bin = CONVERT(VARBINARY(255), @TestString) PRINT @Bin
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.