x

remove unwanted characters from column

Hi all,

I have a account number field that gets supplied to us from one of our partners. The problem is they've dumped (what i think) is carriage returns in this field as REPLACE(field, ' ', '') doesn't seem to be picking them up. How do i identify and remove carraige returns, tabs, line feeds, whitespace, any garbage that isn't text or numbers?

more ▼

asked Jul 09, 2012 at 02:59 PM in Default

avatar image

jhowe
1.1k 56 60 66

(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

Carriage returns (char(13)) line feeds (char(10)) and tabs (char(9)) can be easily replaced

 replace(replace(replace(field, char(9),''), char(10),''), char(13),'')

but its the 'other garbage' that can be difficult to eradicate - you either have to know what it is up-front and deal with it, like above, or have a list of valid characters and try and trap for those...

for example if only numbers where allowed

 where patindex(^[0-9], field) = 0
more ▼

answered Jul 09, 2012 at 03:08 PM

avatar image

Kev Riley ♦♦
64k 48 61 81

thanks kev and what about one or more blank spaces?

Jul 09, 2012 at 03:13 PM jhowe

The replace you quoted in your question should remove blank spaces

Jul 09, 2012 at 03:29 PM Kev Riley ♦♦

this has worked perfectly.

Aug 08, 2012 at 12:39 PM jhowe
(comments are locked)
10|1200 characters needed characters left

For whatever replacements and matching in case you can use CLR, the most siple thing is to use a CLR RegEx implementation. You can take a look on SQL Server Regular Expressions – Replacements for sample implementation.

For your concrete needs you can use the "(?:[^a-zA-Z0-9]|\s)" regular expression which returns all chars, which are not alphanumeric or are white space. In the sample implementation you can simple replace by empty string and you are done.

more ▼

answered Jul 10, 2012 at 05:56 AM

avatar image

Pavel Pawlowski
22.7k 10 15 26

(comments are locked)
10|1200 characters needed characters left

Just for fun, I thought of handling it by using the cteTally which would return a string with valid characters only i.e. [0-9] and [A-Z] ([a-z] are handled by using the UPPER function, but can be handled the same way as [A-Z])

 DECLARE @pstring VARCHAR(200)
 
 SELECT @pstring = 'ABC 456 # & 8
 '
 
 ;  WITH E1(N) AS (
                  SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                  SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                  SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                 ),                          --10E+1 or 10 rows
        E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
        E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
  cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                      -- for both a performance gain and prevention of accidental "overruns"
                  SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                 )
                 ,
 cteValidCharacters(CharValue) AS
  (--==== This provides the list of the valid Characters i.e [0-9] AND [A-Z]
  SELECT CHAR(47+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1       
  UNION ALL
  SELECT TOP 26 CHAR(64+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E2       
  )
  ,
 cteAllCharacters(Chars) AS (
  SELECT 
  SUBSTRING(@pString ,N,1)
  FROM cteTally  
  WHERE UPPER(SUBSTRING(@pString ,N,1)) IN (SELECT [CharValue] FROM cteValidCharacters)
                 )
                 
                 
                 SELECT (SELECT [Chars] [*]
                 FROM cteAllCharacters
                 FOR XML PATH(''))


It can be turned into an inline function to scale up.

more ▼

answered Jul 09, 2012 at 03:52 PM

avatar image

Usman Butt
13.9k 6 13 21

(comments are locked)
10|1200 characters needed characters left

We get what appears to be "unvalidated" data from some of our vendors. Our database is based on the requirements of MS-Dynamics which is much more restrictive. Sometimes we get strange characters in and I use the following script to identify the data. By the way, I wrote this for SQL-2K so there are probably better ways of doing this now.
DECLARE @TestString VARCHAR(30), @Bin VARBINARY(255)

 SELECT @TestString = RTRIM(SomeCharColumn)
 FROM DB..Table (NOLOCK)
 WHERE LEN(SomeCharColumn) = 4 AND SomeCharColumn LIKE '%-DT'
 
 -- Sometimes this won't print (normally happens when there is null data)
 PRINT @TestString
 SET @Bin = CONVERT(VARBINARY(255), @TestString)
 PRINT @Bin
more ▼

answered Jul 12, 2012 at 06:39 PM

avatar image

dvroman
1.1k 1 4 4

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x2079
x741
x30
x7

asked: Jul 09, 2012 at 02:59 PM

Seen: 10719 times

Last Updated: Aug 08, 2012 at 12:39 PM

Copyright 2016 Redgate Software. Privacy Policy