x

How do I do a regex style replace in T-SQL?

I've got some weblogs in a table with URLs of the form:

http://foo.com/bar/12345/baz?param=something789

I'd like to strip out the digits from the path for reporting purposes. They're not constant length and digits may appear elsewhere in the URL, so regex feels like the natural way to remove them.

How can I do this in T-SQL?

more ▼

asked Oct 09 '09 at 10:56 AM in Default

David Bick gravatar image

David Bick
144 1 1 2

(comments are locked)
10|1200 characters needed characters left

9 answers: sort voted first

My preferred route would be to wrap the .net Regex engine in a CLR object. There's an MSDN magazine article on that here

more ▼

answered Oct 09 '09 at 12:23 PM

Matt Whitfield gravatar image

Matt Whitfield ♦♦
29.4k 61 65 87

Nice article -- thanks :)
Oct 09 '09 at 01:32 PM David Bick
What's the performance of that like? Or are there ways to keep the performance high? We've had issues in the past with CLR objects running quite slow.
Oct 09 '09 at 01:43 PM Melvyn Harbour 1 ♦♦
It performs like any other CLR entity really - it's never going to be be more optimal than a UDF given the nature of what goes on 'under the hood' - but I've found that, particularly with .NET's Regex engine, there can be wild differences in performance between different expressions - so it pays to follow the old routine of test, test and test again :)
Oct 09 '09 at 01:59 PM Matt Whitfield ♦♦
You can probably do this via the CLR in SQL Server (never used it myself) but one of the worst performance hits you can take with Regex's is compiling the regex each time you need it - if you can make it static and compile it once you will find performance improves massively.
Oct 28 '09 at 05:55 PM James Moore
(comments are locked)
10|1200 characters needed characters left

Ahem.

CLR Assembly RegEx Functions for SQL Server by Example

but in the case you show, you'll find you may get better performance using the built-in PatIndex() function. At the moment I can't see why you'd need anything else if it is just one block of numbers you need to extract.

more ▼

answered Oct 09 '09 at 02:09 PM

Phil Factor gravatar image

Phil Factor
3.8k 8 9 16

(comments are locked)
10|1200 characters needed characters left

As to performance: While there clearly can be some variation depending on what you put into them, I don't think it's correct to say that CLR (compiled) functions will perform as poorly as UDF's in this case. As a matter of fact - there was some testing done a couple of years back doing speed tests on complicated pattern matching, and CLR regex ended up beating some of the better tricks Jeff Moden put together, and blew the doors off of anything involving user-defined TransactSQL functions.

In the cases where you're building complex calculations not involving data access from within the function, I think you will find the CLR functions will consistently beat the UDF (interpreted) equivalents.

more ▼

answered Oct 11 '09 at 01:59 PM

Matt Miller gravatar image

Matt Miller
11

Good bit of info there - I did some testing with a simple UDF to add two numbers together and it's CLR equivalent, and the CLR equivalent did perform faster - vote up! Do note though, I wasn't saying that T-SQL regex functionality would be equivalent speed, just that there is calling overhead with functions of any kind - I hadn't realised the speed difference, however.
Oct 11 '09 at 05:01 PM Matt Whitfield ♦♦
(comments are locked)
10|1200 characters needed characters left
more ▼

answered Oct 09 '09 at 11:12 AM

thatismatt gravatar image

thatismatt ♦♦
210 7 8 9

(comments are locked)
10|1200 characters needed characters left

As mentioned by Matt Whitfield, CLR is probably going to be the fastest performance wise, and it comes the closest to permitting the use of regext inside of SQL Server at the moment. If this will be something you do frequently and performance is an issue, this is almost certainly the best answer.

My personal approach would be to use Python with pyodbc to handle the processing. Developing a simple script like that in Python is probably much faster than developing the CLR, linking it in to Sql, testing it, and recompiling if you discover any bugs. If you only need to do this as a one time thing or infrequently then this is probably the better overall solution.

more ▼

answered Oct 13 '09 at 06:44 PM

TimothyAWiseman gravatar image

TimothyAWiseman
15.5k 19 22 32

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x977
x7

asked: Oct 09 '09 at 10:56 AM

Seen: 17141 times

Last Updated: Oct 09 '09 at 10:56 AM