I have a table with content for my website and there are website addresses within the content that all end in something similar to:
-c.123.html
-c.134.html
-c.12.html
-c.345.hml
so the full website address would be something like
www.website.co.uk/websitepage-c.123.html
the numbers are different inbetween the -c. and the html
I want to remove all of the -c.***.html
REPLACE doesn't do that. What you might be better off doing is a bit of string slicing - take the left part of the string up to the -c bit, and then adding back the last few characters.
Either that, or you could find / write a CLR routine to do the job. See, for example, this set of posts: https://www.sqlservercentral.com/blogs/regex-in-sql-server-for-replacing-text
Answer by Jeff Moden ·
This should do it.
SELECT testdata.WebSite ,Cleaned = CASE WHEN testdata.WebSite LIKE ('%[-]c.[0-9]%[0-9].h%') THEN SUBSTRING(testdata.WebSite,1,PATINDEX('%[-]c.[0-9]%[0-9].h%',testdata.WebSite)-1) ELSE testdata.WebSite END FROM (VALUES --Simulating the readily consumable test data you forgot to post ('www.website.co.uk/websitepage-c.123.html') ,('www.website.co.uk/websitepage-c.134.html') ,('www.website.co.uk/websitepage-c.12.html') ,('www.website.co.uk/websitepage-c.345.hml') ,('www.website.co.uk/websitepage-calls.345.hml') )testdata(WebSite) ;
Answer by Magnus Ahlkvist ·
To me, this would be one of the rare cases for using CLR-functions (as Thomas Rushton suggests). In .NET you have Regex-methods which you can expose in a CLR Function.
Using substring as srutzky suggests is not a safe way to solve this problem, as it can't handle multiple matches, it'll use the leftmost match och '-c.'. It would return incorrect results for this one for example: https://www.section-c.com/c-123.html But with RegEx functions, you can create more complex pattern-matching for replace.
Answer by srutzky ·
If I am understanding the request correctly, then all you need is a simple SUBSTRING since there is a common, static piece of text that you can match on — "-c." — so you can ignore the part with the numbers that you want to get rid of. For example:
DECLARE @Values TABLE ( [Value] NVARCHAR(100) NOT NULL ); INSERT INTO @Values ([Value]) VALUES (N'https://www.website.co.uk/websitepage-c.123.html'); INSERT INTO @Values ([Value]) VALUES (N'https://www.section-c.com/c-123.html'); ;WITH cte AS ( SELECT [Value], (CHARINDEX(N'-c.', [Value]) - 1) AS [Pos] FROM @Values ) SELECT SUBSTRING([Value], 1, IIF(cte.[Pos] > 0, cte.[Pos], LEN([Value]))) FROM cte ;WITH cte AS ( SELECT [Value], (PATINDEX(N'%-c.[0-9]%', [Value]) - 1) AS [Pos] FROM @Values ) SELECT SUBSTRING([Value], 1, IIF(cte.[Pos] > 0, cte.[Pos], LEN([Value]))) FROM cte
That returns:
https://www.website.co.uk/websitepage https://www.section
and then:
https://www.website.co.uk/websitepage https://www.section-c.com/c-123.html
If for some reason have more complex requirements that can't easily be done using T-SQL, then you can use RegEx if SQLCLR is an option (and it definitely is not if you are using either Azure SQL DB single DB or SQL Server 2017+ on AWS RDS). Several pre-compiled RegEx functions are available in the free version of SQL# (SQLsharp), a SQLCLR library containing many functions and stored procedures (that I wrote).