question

anadi11 avatar image
anadi11 asked

String cleanup

I have a column with values like "6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-4db6-94e5-7246ddba794e". The values I need are 6;9 from the above string. This pattern might be repeated as much as 12-15 times in the string but i would like to extract only the values before the |. Any help would be appreciated. Thanks in advance!!
data
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image ThomasRushton ♦♦ commented ·
What version of SQL Server?
0 Likes 0 ·
anadi11 avatar image anadi11 commented ·
Thanks All! The output I need are any values infront of the |
0 Likes 0 ·
anadi11 avatar image anadi11 commented ·
And they need to be in a single column seperated by ;
0 Likes 0 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
I think the confusion lies about the format of the answer - do you want it as a single-field table with multiple rows, as shown in @kev Riley's & @Pavel's answers? Or do you want it as you put it, in a single field separated by semicolons?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
do you mean single row?
0 Likes 0 ·
Show more comments
Pavel Pawlowski avatar image
Pavel Pawlowski answered
Yeah.. It can be easily done by using CLR RegEx.. Take a look on [SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching][1]. Then using the sample CLR function from the post you can easily achieve this using RegEx: **`'(\\d+?)\\|'`** SELECT value FROM [dbo].fn_RegExMatches('6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-4db6-94e5-7246ddba794e', '(\d+?)\|') WHERE groupId = 1 results: value ------------------- 6 9 (2 row(s) affected) This is example using the CLR RegEx implementation, but there are other possibilities even using pure T-SQL. You can further fine tune the regular expression to meet your exact needs. **== EDIT ==** So after the comment here is updated CLR RegEx function, which handles this slightly. [SqlFunction(IsDeterministic = true)] public static SqlChars RegExReplace(SqlString sourceString, SqlString pattern, SqlString replacement, int matchId) { Match m = null; Regex r = new Regex(pattern.Value, RegexOptions.Compiled); if (matchId == 0) { return new SqlChars(r.Replace(sourceString.Value, replacement.Value)); } if (matchId == 1) { m = r.Match(sourceString.Value); } else if (matchId > 1) { MatchCollection mc = r.Matches(sourceString.Value); m = mc != null && mc.Count > matchId - 1 ? mc[matchId - 1] : null; } return m != null ? new SqlChars(m.Result(replacement.Value)) : SqlChars.Null; } Then a simple select solves this: SELECT [dbo].[fn_RegExReplace]( '6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-4db6-94e5-7246ddba794e', '(\d+?)(\|)(.+?)(;|$)', '$1$4', 0) Results: ------------------- 6;9 (1 row(s) affected) **== END EDIT ==** [1]: http://www.pawlowski.cz/2010/09/sql-server-2005-and-sql-server-2008-regular-expressions-and-pattern-matching-2/
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@Pavel Pawlowski Looks like OP needs a scalar CLR function, something like enhanced REPLACE(source\_str, regex\_pattern, replace\_val). Basically, replace all occurences of the pattern with specified replacement, so much like the regular T-SQL REPLACE but the second parameter accepts an enhanced regex pattern. In this particular case, find **pipe** followed by **GUID** and replace it with **empty string**. This will yield the number before pipe then semicolon then another number etc.
1 Like 1 ·
Pavel Pawlowski avatar image Pavel Pawlowski commented ·
@Oleg, I updated the answer, so it handles this easily.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@Pavel Pawlowski This is a handy function, very flexible due to the matchId implementation.
0 Likes 0 ·
Kev Riley avatar image
Kev Riley answered
Not sure how exactly you want the output, but this will give you all the single character values before a '|'. DECLARE @string varchar(500) SET @string = '6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-4db6-94e5-7246ddba794e' ;with tally as ( select top 10000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) SELECT SUBSTRING(@string, N-1, 1) FROM tally WHERE N < LEN(@string) AND SUBSTRING(@string,N,1) = '|' gives ---- 6 9 (2 row(s) affected) --- **edit**: OP has confirmed output should be one column, one row Again only works for single character values, but gives output as required ;with tally as ( select top 10000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) ,chars AS ( SELECT SUBSTRING(@string, N-1, 1) chr FROM tally WHERE N < LEN(@string) AND SUBSTRING(@string,N,1) = '|' ) select stuff ( ( SELECT ';' + chr FROM chars table1 FOR XML PATH ( '' ) ) , 1 , 1 , '' ) gives -------- 6;9
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

KenJ avatar image KenJ commented ·
You turn a nice CTE @Kev Riley
1 Like 1 ·
KenJ avatar image KenJ commented ·
@anadi11 - to get to chars table1, start at the top of the query... @Kev Riley creates a Common Table Expression (CTE) that provides a list of consecutive numbers - tally - which he then references from another CTE - chars. The chars CTE uses the tally CTE to retrieve a list of single characters from just prior to every | character. That's the SUBSTRING(@string, N... and N-1 bit. The SELECT portion of the query, where we see chars table1, is where he finally stuffs that list of characters obtained in the chars CTE together into the final ; delimited string. table1 is an alias for chars which is the CTE defined earlier in the query. You can view the MSDN blurb on using CTEs at http://msdn.microsoft.com/en-us/library/ms190766.aspx There is a somewhat friendlier CTE introduction on the 4GuysFromRolla site - http://www.4guysfromrolla.com/webtech/071906-1.shtml
1 Like 1 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
This would be my approach; I would combine it with the XML concatenation hack to format the numbers as 6;9... but yeah, +1.
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Yeah not sure what the OP wants, was going to add the XML, but not sure....ta
0 Likes 0 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
That's often the difficulty, interpreting the exact requirements... ;-)
0 Likes 0 ·
Show more comments
Phil Factor avatar image
Phil Factor answered
If your numbers are single digit, then this will do the trick. Let me know if they can be multi-digit since the TASQL is slightly more complicated but there really isn't a need to wheel out a CLR if you don't want to. DECLARE @DirtyData TABLE (String VARCHAR(200),CleanStuff VARCHAR(200) DEFAULT '') INSERT INTO @DirtyData (string) SELECT '6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-4db6-94e5-7246ddba794e' UNION select '9|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-3|4db6-94e5-7246ddba794e' union select 'jk7|91f1ed7a-04df-47232|7-8b4c-0cb8fde3bc0d;9|31dc09be-1ace3|8-4db6-94e5-7246ddba794e' union select '6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e' union select '2|91f1ed7a-04df-4723-8b4c-0cb8fde35|4bc0d;9|31dc09be-1ace-4db6-94e5-7246dd2|3ba794e' DECLARE @StuffToDo INT SELECT @StuffToDo=1 WHILE @StuffToDo>0 begin UPDATE @DirtyData set cleanstuff=Cleanstuff+CASE WHEN cleanstuff='' THEN '' ELSE ';' END + SUBSTRING(String,PATINDEX('%[0-9]|%',String),1), String=STUFF(String,PATINDEX('%[0-9]|%',String)+1,1,'') WHERE PATINDEX('%[0-9]|%',String)>0 SET @StuffToDo=@@Rowcount END SELECT string, Cleanstuff FROM @DirtyData Cleanstuff ----------------------------------------------------------------------------- 2;5;9;2 6;9;4;4 6;9 9;9;3 7;2;9;3
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Mister Magoo avatar image
Mister Magoo answered
Borrowed Phil's sample data... This handles varying length before the "|" and multiple pipes before the ";" SELECT '6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-4db6-94e5-7246ddba794e' as data into #test UNION select '9|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-3|4db6-94e5-7246ddba794e' union select 'jk7|91f1ed7a-04df-47232|7-8b4c-0cb8fde3bc0d;9|31dc09be-1ace3|8-4db6-94e5-7246ddba794e' union select '6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e;6|91f1ed7a-04df-4723-8b4c-0cb8fde3bc0d;9|31dc09be-1ace-44|5db6-94e5-7246ddb4|a794e' union select '2|91f1ed7a-04df-4723-8b4c-0cb8fde35|4bc0d;9|31dc09be-1ace-4db6-94e5-7246dd2|3ba794e' select data,stuff(stripped,1,1,'') as stripped from #test as T cross apply ( select ';'+ SUBSTRING(data,N,CHARINDEX('|',data+'|',N)-N) from helpers.dbo.Tally where substring(';'+data,N,1)=';' for xml path('') ) X(stripped)
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Mister Magoo avatar image Mister Magoo commented ·
My Tally table lives in helpers db as you can see and starts at 1.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.