x

remove unwanted characters from column

Hi all,

I have a account number field that gets supplied to us from one of our partners. The problem is they've dumped (what i think) is carriage returns in this field as REPLACE(field, ' ', '') doesn't seem to be picking them up. How do i identify and remove carraige returns, tabs, line feeds, whitespace, any garbage that isn't text or numbers?
more ▼

asked Jul 09, 2012 at 02:59 PM in Default

jhowe gravatar image

jhowe
1.1k 54 58 61

(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

Carriage returns (char(13)) line feeds (char(10)) and tabs (char(9)) can be easily replaced

replace(replace(replace(field, char(9),''), char(10),''), char(13),'')

but its the 'other garbage' that can be difficult to eradicate - you either have to know what it is up-front and deal with it, like above, or have a list of valid characters and try and trap for those...

for example if only numbers where allowed

where patindex(^[0-9], field) = 0
more ▼

answered Jul 09, 2012 at 03:08 PM

Kev Riley gravatar image

Kev Riley ♦♦
54k 47 49 76

thanks kev and what about one or more blank spaces?
Jul 09, 2012 at 03:13 PM jhowe
The replace you quoted in your question should remove blank spaces
Jul 09, 2012 at 03:29 PM Kev Riley ♦♦
this has worked perfectly.
Aug 08, 2012 at 12:39 PM jhowe
(comments are locked)
10|1200 characters needed characters left

For whatever replacements and matching in case you can use CLR, the most siple thing is to use a CLR RegEx implementation. You can take a look on SQL Server Regular Expressions – Replacements for sample implementation.

For your concrete needs you can use the "(?:[^a-zA-Z0-9]|\s)" regular expression which returns all chars, which are not alphanumeric or are white space. In the sample implementation you can simple replace by empty string and you are done.

more ▼

answered Jul 10, 2012 at 05:56 AM

Pavel Pawlowski gravatar image

Pavel Pawlowski
22.3k 9 11 21

(comments are locked)
10|1200 characters needed characters left

Just for fun, I thought of handling it by using the cteTally which would return a string with valid characters only i.e. [0-9] and [A-Z] ([a-z] are handled by using the UPPER function, but can be handled the same way as [A-Z])

DECLARE @pstring VARCHAR(200)

SELECT @pstring = 'ABC 456 # & 8
'

;  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                )
                ,
cteValidCharacters(CharValue) AS
 (--==== This provides the list of the valid Characters i.e [0-9] AND [A-Z]
 SELECT CHAR(47+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1       
 UNION ALL
 SELECT TOP 26 CHAR(64+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E2       
 )
 ,
cteAllCharacters(Chars) AS (
 SELECT 
 SUBSTRING(@pString ,N,1)
 FROM cteTally  
 WHERE UPPER(SUBSTRING(@pString ,N,1)) IN (SELECT [CharValue] FROM cteValidCharacters)
                )


                SELECT (SELECT [Chars] [*]
                FROM cteAllCharacters
                FOR XML PATH(''))
It can be turned into an inline function to scale up.
more ▼

answered Jul 09, 2012 at 03:52 PM

Usman Butt gravatar image

Usman Butt
13.9k 6 8 14

(comments are locked)
10|1200 characters needed characters left

We get what appears to be "unvalidated" data from some of our vendors. Our database is based on the requirements of MS-Dynamics which is much more restrictive. Sometimes we get strange characters in and I use the following script to identify the data. By the way, I wrote this for SQL-2K so there are probably better ways of doing this now.
DECLARE @TestString VARCHAR(30), @Bin VARBINARY(255)

SELECT @TestString = RTRIM(SomeCharColumn)
FROM DB..Table (NOLOCK)
WHERE LEN(SomeCharColumn) = 4 AND SomeCharColumn LIKE '%-DT'

-- Sometimes this won't print (normally happens when there is null data)
PRINT @TestString
SET @Bin = CONVERT(VARBINARY(255), @TestString)
PRINT @Bin
more ▼

answered Jul 12, 2012 at 06:39 PM

dvroman gravatar image

dvroman
1.1k 2 2

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1850
x597
x18
x3

asked: Jul 09, 2012 at 02:59 PM

Seen: 3700 times

Last Updated: Aug 08, 2012 at 12:39 PM