|
I've got some weblogs in a table with URLs of the form: http://foo.com/bar/12345/baz?param=something789 I'd like to strip out the digits from the path for reporting purposes. They're not constant length and digits may appear elsewhere in the URL, so regex feels like the natural way to remove them. How can I do this in T-SQL?
(comments are locked)
|
|
My preferred route would be to wrap the .net Regex engine in a CLR object. There's an MSDN magazine article on that here Nice article -- thanks :)
Oct 09 '09 at 01:32 PM
David Bick
What's the performance of that like? Or are there ways to keep the performance high? We've had issues in the past with CLR objects running quite slow.
Oct 09 '09 at 01:43 PM
Melvyn Harbour 1 ♦♦
It performs like any other CLR entity really - it's never going to be be more optimal than a UDF given the nature of what goes on 'under the hood' - but I've found that, particularly with .NET's Regex engine, there can be wild differences in performance between different expressions - so it pays to follow the old routine of test, test and test again :)
Oct 09 '09 at 01:59 PM
Matt Whitfield ♦♦
You can probably do this via the CLR in SQL Server (never used it myself) but one of the worst performance hits you can take with Regex's is compiling the regex each time you need it - if you can make it static and compile it once you will find performance improves massively.
Oct 28 '09 at 05:55 PM
James Moore
(comments are locked)
|
|
Ahem. CLR Assembly RegEx Functions for SQL Server by Example but in the case you show, you'll find you may get better performance using the built-in PatIndex() function. At the moment I can't see why you'd need anything else if it is just one block of numbers you need to extract.
(comments are locked)
|
|
As to performance: While there clearly can be some variation depending on what you put into them, I don't think it's correct to say that CLR (compiled) functions will perform as poorly as UDF's in this case. As a matter of fact - there was some testing done a couple of years back doing speed tests on complicated pattern matching, and CLR regex ended up beating some of the better tricks Jeff Moden put together, and blew the doors off of anything involving user-defined TransactSQL functions. In the cases where you're building complex calculations not involving data access from within the function, I think you will find the CLR functions will consistently beat the UDF (interpreted) equivalents. Good bit of info there - I did some testing with a simple UDF to add two numbers together and it's CLR equivalent, and the CLR equivalent did perform faster - vote up! Do note though, I wasn't saying that T-SQL regex functionality would be equivalent speed, just that there is calling overhead with functions of any kind - I hadn't realised the speed difference, however.
Oct 11 '09 at 05:01 PM
Matt Whitfield ♦♦
(comments are locked)
|
|
You can install SQL Master data services and you'll get clr procedures for regex.
(comments are locked)
|
|
As mentioned by Matt Whitfield, CLR is probably going to be the fastest performance wise, and it comes the closest to permitting the use of regext inside of SQL Server at the moment. If this will be something you do frequently and performance is an issue, this is almost certainly the best answer. My personal approach would be to use Python with pyodbc to handle the processing. Developing a simple script like that in Python is probably much faster than developing the CLR, linking it in to Sql, testing it, and recompiling if you discover any bugs. If you only need to do this as a one time thing or infrequently then this is probably the better overall solution.
(comments are locked)
|
1 2 next page »

