question

David Bick avatar image
David Bick asked

How do I do a regex style replace in T-SQL?

I've got some weblogs in a table with URLs of the form:

http://foo.com/bar/12345/baz?param=something789

I'd like to strip out the digits from the path for reporting purposes. They're not constant length and digits may appear elsewhere in the URL, so regex feels like the natural way to remove them.

How can I do this in T-SQL?

t-sqlregex
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

My preferred route would be to wrap the .net Regex engine in a CLR object. There's an MSDN magazine article on that here

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

thatismatt avatar image
thatismatt answered
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Phil Factor avatar image
Phil Factor answered

Ahem.

CLR Assembly RegEx Functions for SQL Server by Example

but in the case you show, you'll find you may get better performance using the built-in PatIndex() function. At the moment I can't see why you'd need anything else if it is just one block of numbers you need to extract.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Miller avatar image
Matt Miller answered

As to performance: While there clearly can be some variation depending on what you put into them, I don't think it's correct to say that CLR (compiled) functions will perform as poorly as UDF's in this case. As a matter of fact - there was some testing done a couple of years back doing speed tests on complicated pattern matching, and CLR regex ended up beating some of the better tricks Jeff Moden put together, and blew the doors off of anything involving user-defined TransactSQL functions.

In the cases where you're building complex calculations not involving data access from within the function, I think you will find the CLR functions will consistently beat the UDF (interpreted) equivalents.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

TimothyAWiseman avatar image
TimothyAWiseman answered

As mentioned by Matt Whitfield, CLR is probably going to be the fastest performance wise, and it comes the closest to permitting the use of regext inside of SQL Server at the moment. If this will be something you do frequently and performance is an issue, this is almost certainly the best answer.

My personal approach would be to use Python with pyodbc to handle the processing. Developing a simple script like that in Python is probably much faster than developing the CLR, linking it in to Sql, testing it, and recompiling if you discover any bugs. If you only need to do this as a one time thing or infrequently then this is probably the better overall solution.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered
You can install SQL Master data services and you'll get clr procedures for regex. http://thefirstsql.com/2011/02/07/regular-expressions-advanced-string-matching-and-new-split-function-sql-server-2008-r2/
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.