x

How do I do a regex style replace in T-SQL?

I've got some weblogs in a table with URLs of the form:

http://foo.com/bar/12345/baz?param=something789

I'd like to strip out the digits from the path for reporting purposes. They're not constant length and digits may appear elsewhere in the URL, so regex feels like the natural way to remove them.

How can I do this in T-SQL?

more ▼

asked Oct 09, 2009 at 10:56 AM in Default

avatar image

David Bick
144 1 1 4

(comments are locked)
10|1200 characters needed characters left

9 answers: sort voted first

My preferred route would be to wrap the .net Regex engine in a CLR object. There's an MSDN magazine article on that here

more ▼

answered Oct 09, 2009 at 12:23 PM

avatar image

Matt Whitfield ♦♦
29.5k 62 66 88

(comments are locked)
10|1200 characters needed characters left

Ahem.

CLR Assembly RegEx Functions for SQL Server by Example

but in the case you show, you'll find you may get better performance using the built-in PatIndex() function. At the moment I can't see why you'd need anything else if it is just one block of numbers you need to extract.

more ▼

answered Oct 09, 2009 at 02:09 PM

avatar image

Phil Factor
4.2k 8 22 20

(comments are locked)
10|1200 characters needed characters left

As to performance: While there clearly can be some variation depending on what you put into them, I don't think it's correct to say that CLR (compiled) functions will perform as poorly as UDF's in this case. As a matter of fact - there was some testing done a couple of years back doing speed tests on complicated pattern matching, and CLR regex ended up beating some of the better tricks Jeff Moden put together, and blew the doors off of anything involving user-defined TransactSQL functions.

In the cases where you're building complex calculations not involving data access from within the function, I think you will find the CLR functions will consistently beat the UDF (interpreted) equivalents.

more ▼

answered Oct 11, 2009 at 01:59 PM

avatar image

Matt Miller
11

(comments are locked)
10|1200 characters needed characters left
more ▼

answered Oct 09, 2009 at 11:12 AM

avatar image

thatismatt ♦♦
210 8 12 12

(comments are locked)
10|1200 characters needed characters left

As mentioned by Matt Whitfield, CLR is probably going to be the fastest performance wise, and it comes the closest to permitting the use of regext inside of SQL Server at the moment. If this will be something you do frequently and performance is an issue, this is almost certainly the best answer.

My personal approach would be to use Python with pyodbc to handle the processing. Developing a simple script like that in Python is probably much faster than developing the CLR, linking it in to Sql, testing it, and recompiling if you discover any bugs. If you only need to do this as a one time thing or infrequently then this is probably the better overall solution.

more ▼

answered Oct 13, 2009 at 06:44 PM

avatar image

TimothyAWiseman
15.6k 22 49 38

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1066
x9

asked: Oct 09, 2009 at 10:56 AM

Seen: 21644 times

Last Updated: Oct 09, 2009 at 10:56 AM

Copyright 2016 Redgate Software. Privacy Policy