question

yd2711 avatar image
yd2711 asked

Replacing extra spaces in text field

I have a text field with multiple line breaks and spaces that I need to remove. I have seen suggestions for char(10) and tried the following different possibilities: SELECT REPLACE(CAST(MYFIELD AS VARCHAR(8000)), 'char(10)', '') SELECT REPLACE(CAST(MYFIELD AS VARCHAR(8000)), char(10), '') but it isn't working. If I try it without the CAST to varchar - replace(MYFIELD,char(10),'') I receive the error message: > Argument data type text is invalid for argument 1 of replace function. Any ideas would be appreciated
replacetext
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
Have you an example of the values of MYFIELD?
0 Likes 0 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
Which version of SQL Server?
0 Likes 0 ·
Mart avatar image Mart commented ·
Do you mean you have multiple spaces that need to be changed to one space and line breaks that need to be removed? e.g "The cow jumped over the moon" would get 'fixed' to "The cow jumped over the moon"
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
This web site runs off of votes. Can you please indicate all helpful answers below by clicking on the thumbs up next to those answers. If any one answer lead to a solution, please indicate that by clicking on the check mark next to that one answer.
0 Likes 0 ·
Oleg avatar image
Oleg answered
Depending on the source of the text, the line break does not necessarily consists of just char(10) - line feed, but may also include the char(13) - carriage return. Generally speaking the line break is carriage return followed by the line feed, i.e. char(13) + char(10). Additionally, for your exercise you might want to consider removing the tabs - char(9) and replacing the occurrence of multiple spaces - char(32) with just one occurrence. Finally, I would consider to NOT attempting to cast the text as varchar(8000) but opt for varchar(max) instead which you can do if you have SQL Server 2005 or better. Here is the sample which will replace all the characters mentioned above: select replace(replace(replace(replace(cast(MyField as varchar(max)), char(13), ''), char(10), ''), char(9), ''), ' ', ''); Just in case if the column in question has many spaces one after another and you want to reduce ALL of them with just one space, I would highly recommend reading the excellent article by Jeff Moden titled [REPLACE Multiple Spaces with One][1] Borrowing from Jeff Moden's logic, you can restate the query above to handle not only the removal of unwanted line breaks but also the reduction of any number of multiple spaces with just one space if you need to. Because it is possible that you will have the situation when there is a word followed by the line break followed by another word on the next line, replacing the line break with empty string might cause two words (from different lines) to be stuck together, but you would probably want to have them separated by the single space, it would make sense to replace each line feed and tab with single space, carriage return with blank and only then the occurrence of any number of spaces with just one space: select replace(replace(replace(replace(replace(replace(cast(MyField as varchar(max)), char(13), ''), char(9), ' '), char(10), ' '), ' ', ' ' + char(7)), char(7) + ' ', ''), char(7), '') This bunch of replaces made the statement looking very complex, so you can at this time consider saving it off as an ITVF (inline table valued function, not the scalar function for performance reasons) Hope this helps. Oleg [1]: http://www.sqlservercentral.com/articles/T-SQL/68378/
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

yd2711 avatar image yd2711 commented ·
I apologize for not responding sooner, for some reason I stopped receiving notification emails when new responses posted to this thread. Oleg, your response was extremely helpful. I combined it with the awesome article you referenced from Jeff Moden and created the following 2 step solution: > UPDATE t_tabs set Bullets = replace(replace(replace(replace(cast(bullets as varchar(8000)), char(13), ''+CHAR(7)+''), char(10), ''+CHAR(7)+''), char(9), ''+CHAR(7)+''), ' ', ''+CHAR(7)+'') from t_tabs > update t_tabs set Bullets = replace(CAST(bullets as varchar(8000)),''+CHAR(7)+'','') I probably could have wrapped that into 1 command, but I wanted to see the changes as they happened to make sure it was doing what I wanted. FYI, I use varchar(8000) instead of varchar(max) because I know the original source of this field had that max. I have given you the green check :) Thank you very much.
0 Likes 0 ·
SQLGoooRooo avatar image
SQLGoooRooo answered
Possibly something to this effect: REPLACE(myfield, ' ', ''); Or maybe if you have multiple spaces in different places: CREATE FUNCTION RemoveAllSpaces ( @InputStr varchar(8000) ) RETURNS varchar(8000) AS BEGIN declare @ResultStr varchar(8000) set @ResultStr = @InputStr while charindex(' ', @ResultStr) > 0 set @ResultStr = replace(@InputStr, ' ', '') return @ResultStr END this would change 'zz zz z zz zz z' to this 'zzzzzzzzzz'
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

yd2711 avatar image yd2711 commented ·
Assuming this would work on a text field, this would remove all spaces which would be counter-productive. "The cow jumped over the moon" would be rendered = "The cowjumpedoverthemoon"
0 Likes 0 ·
SQLGoooRooo avatar image SQLGoooRooo commented ·
show us maybe an example of beginning text scenario and desired end result so we can try to tailor suit an acceptable answer.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@SQLGoooRooo I am sorry, but it looks like you are giving a pretty bad advice in your answer. There is no reason whatsoever to consider an evil scalar function like this, which can never expand into the execution plan. The WHILE loop is best avoided in T-SQL because most times there is no reason to use a RBAR processing when the same result can be achieved by the set based query. In case with your function, imagine what will happen if this function is called on the column which stores big size text values and the select statement returns many records. The performance of such query will suffer greatly. Here is the sample using the logic borrowed directly from Jeff Moden's article I mentioned in my answer: declare @s varchar(max) = 'This is a text with multiple spaces in it'; select replace(replace(replace(@s, ' ', ' ' + char(7)), char(7) + ' ', ''), char(7), '') -- results This is a text with multiple spaces in it
0 Likes 0 ·
SQLGoooRooo avatar image SQLGoooRooo commented ·
It depends Oleg. For all we know every single row has extra spaces in it. Contrary to common SQL myth scalar functions are NOT ALWAYS BAD and it always depends on what is known about the data. Thus the reason for my follow up question about seeing an example of what the expected result would look like. In general yes its avised to avoid them on large tables with repetitive reuse when there may be rows that don't need touched. The better question is why does the source data put the extra padding in there to begin with and why is this data smoothing needed and how often? Everyday? 5,000 times a minute? The answer sometimes is just this one run to fix 10 row problem on 15 rows. It just depends. ALL NEW SQL code is advisable to be checked against execution plan if its going to be kept and reused to compare actual cost stats before choosing any methodology because my table might differ from his and so on but its wrong to believe that scalar functions are ALWAYS bad or evil. I often use them for one time only use cases when I know for certain that every single row is going to be touched because the performance in those situations often is the same. It depends.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@SQLGoooRooo I am not saying that the scalar functions are always bad, my main point was the usage of the WHILE loop. That is unless you are trying to say that the WHILE loop used to traverse the string from first position of the space until done is somehow better than the 3 simple replaces which guarantee the reduction of any number of spaces with just one space. There is a very nice SQL Spackle by @Jeff Moden titled ["How to Make Scalar UDFs Run Faster"][1] which is discussing the differences between scalar and ITVF functions. The verdict there is "it depends" of course, but still, the ITVF can make the code run faster. [1]: http://www.sqlservercentral.com/articles/T-SQL/91724/
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.