x

Extracting certain pattern of string from rows

 Create table MessageDetail
 (
 Msg varchar(1000)
 )
 
 End space,?,/>, ", end of line
 
 Insert into MessageDetail Values
 ('abc1 just'),
 ('This abc5? is abc2 just testing abc3 infomation'),
 ('This is just xyz1 testing infomation abc234/> test'),
 ('This is just xyz34 testing abc231?3 infomation'),
 ('This is just testing xyz67/> infomation abc32? test'),
 ('This xyz34 is xyz5g55 just xyz5 infomation'),
 ('This is just testing infomation xyz'),
 ('This is xyz56f3 just testing abc')
 
 select * from MessageDetail
 
 Where PATINDEX('%abc%', Msg)> 0 
 Or PATINDEX('%xyz%', Msg)> 0

 


on above table, I need to extract only string from those rows of pattern which start from abc or xyz Those patter may be 1 or more time. If more then one then have to retrive rest of those pattern too. Start with same pattern as in where clause but end of that links are some time ", some time space, some time /> and some time end of line.

more ▼

asked Jun 20, 2012 at 12:54 PM in Default

avatar image

munabhai
0 4 4 4

Aghh...Cant format it correctly :(

Jun 20, 2012 at 01:03 PM Usman Butt

@munabhai Can you please post the desired output as well for the test data? This would help others to better understand your requirement. Thanks.

Jun 20, 2012 at 01:06 PM Usman Butt
(comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

If you are running SQL Server 2005 and above, then the easies way will be CLR Implementation of RegEx.

You can take a look on my post SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching.

There is also a sample solution available for download and this is easily solvable by RegEx expression.

more ▼

answered Jun 20, 2012 at 01:37 PM

avatar image

Pavel Pawlowski
22.7k 10 15 26

+1. Cannot agree more. Best solution in such scenarios is to use a CLR function.

Jun 20, 2012 at 01:44 PM Usman Butt

to use CLR I need to know VB and C# too. Or I can just use ms-sql?

Jun 20, 2012 at 02:01 PM munabhai

@munabhai Yes, you need to know VB and/or C#. However, in this case, because @Pavel's solution (in the post linked in his answer above) is complete and includes very detailed instructions on how to deploy the CLR solution to the SQL Server, all you really need is just a basic understanding of the syntax. Read through the article, follow the code along with instructions and you should be able to deploy his solution in the relatively short period of time. SQL Server installation out of the box includes so called BIDS (Business Intelligence Development Studio), which you can use as an IDE. Open it, start from creating a new SQL-CLR project and go from there. If you already have the the full blown version of Visual Studio then the CLR project template is not going to be in BIDS, but will be under Database Projects folder in VS. If you cannot find BIDS on the box where SQL Server is installed then you can re-run the SQL Server installation from the disk and opt to include the components. This will install BIDS.

Jun 20, 2012 at 03:58 PM Oleg
(comments are locked)
10|1200 characters needed characters left

I agree with Pavel with a caveat. If you need to do this inside the database engine then CLR is by far the best way to go, and his post that he linked is an excellent reference. Phil Factor also has a good post on this topic at: CLR Assembly RegEx Functions for SQL Server by Example

However, I think you would generally be better off doing it outside of the databse, in the application or presentation layers. You can use SQL's limited pattern matching with LIKE to determine what should be sent to the application or presentation layers, and then use RegEx there.

If you truly must do this in T-SQL, you certainly can, but I can't think of a way to do it in one query. I would use an initial select with a LIKE conditional in the where clause to determine which rows needed to be processed and then put them in a table variable or temp table. Then I would use nested loops to process each row and extract each match within the row. It wouldn't be particularly hard, but it would be awkward and ineffecient compared to using RegEx either in CLR or in your application layer.

more ▼

answered Jun 20, 2012 at 04:45 PM

avatar image

TimothyAWiseman
15.6k 22 51 38

(comments are locked)
10|1200 characters needed characters left

Just for fun (and yes just because @TimothyAWiseman said he couldn't think of a way of doing this in one query) here's a single query, pure t-sql solution. Not saying it will be fast though!

 ;with tally as
 (select top 10000
     ROW_NUMBER() over(order by sv1.number) N
 from
     master.dbo.spt_values sv1,
     master.dbo.spt_values sv2
 ),
 
 AlteredMsgs as 
 (select row_number()over(order by msg) as ID, 
    cast(' '+msg+ ' ' as char(1002)) as Msg
 from MessageDetail)
 ,
 
 stringstofind (pattern) as 
 (select 'xyz' union select 'abc')
 
 
 select
  substring(msg,N+1, charindex(' ',msg,N+1)-N-1)
 from AlteredMsgs
 join tally on N> len(msg)
 join stringstofind
   on substring(msg,N+1, charindex(' ',msg,N+1)-N-1) like stringstofind.pattern+'%'
 and substring(msg,N,1) = ' '
 order by ID


[please note the code formatter is currently encoding '>' as > ]

more ▼

answered Jun 21, 2012 at 10:24 AM

avatar image

Kev Riley ♦♦
64.2k 48 62 81

Ok, I'm impressed.

Jun 22, 2012 at 03:48 PM TimothyAWiseman

Ha Ha - Thank you @TimothyAWiseman!

Jun 22, 2012 at 04:00 PM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left

Thanks Pavan, Oleg, TimothyAwiseman for suggestion. I am planning to use loop to check if there is multiple occurance of my patter.

more ▼

answered Jun 20, 2012 at 04:48 PM

avatar image

munabhai
0 4 4 4

(comments are locked)
10|1200 characters needed characters left

Kev you are awesome

more ▼

answered Jun 21, 2012 at 01:53 PM

avatar image

munabhai
0 4 4 4

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x14
x9
x6

asked: Jun 20, 2012 at 12:54 PM

Seen: 3618 times

Last Updated: Jun 22, 2012 at 04:00 PM

Copyright 2016 Redgate Software. Privacy Policy