x

Extracting certain pattern of string from rows

Create table MessageDetail
(
Msg varchar(1000)
)

End space,?,/>, ", end of line

Insert into MessageDetail Values
('abc1 just'),
('This abc5? is abc2 just testing abc3 infomation'),
('This is just xyz1 testing infomation abc234/> test'),
('This is just xyz34 testing abc231?3 infomation'),
('This is just testing xyz67/> infomation abc32? test'),
('This xyz34 is xyz5g55 just xyz5 infomation'),
('This is just testing infomation xyz'),
('This is xyz56f3 just testing abc')

select * from MessageDetail

Where PATINDEX('%abc%', Msg)> 0 
Or PATINDEX('%xyz%', Msg)> 0

on above table, I need to extract only string from those rows of pattern which start from abc or xyz Those patter may be 1 or more time. If more then one then have to retrive rest of those pattern too. Start with same pattern as in where clause but end of that links are some time ", some time space, some time /> and some time end of line.

more ▼

asked Jun 20, 2012 at 12:54 PM in Default

munabhai gravatar image

munabhai
0 4 4 4

Aghh...Cant format it correctly :(
Jun 20, 2012 at 01:03 PM Usman Butt
@munabhai Can you please post the desired output as well for the test data? This would help others to better understand your requirement. Thanks.
Jun 20, 2012 at 01:06 PM Usman Butt
(comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

If you are running SQL Server 2005 and above, then the easies way will be CLR Implementation of RegEx.

You can take a look on my post [SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching][1].

There is also a sample solution available for download and this is easily solvable by RegEx expression.

[1]: http://www.pawlowski.cz/2010/09/sql-server-2005-and-sql-server-2008-regular-expressions-and-pattern-matching-2/
more ▼

answered Jun 20, 2012 at 01:37 PM

Pavel Pawlowski gravatar image

Pavel Pawlowski
22.3k 9 11 21

+1. Cannot agree more. Best solution in such scenarios is to use a CLR function.
Jun 20, 2012 at 01:44 PM Usman Butt
to use CLR I need to know VB and C# too. Or I can just use ms-sql?
Jun 20, 2012 at 02:01 PM munabhai
@munabhai Yes, you need to know VB and/or C#. However, in this case, because @Pavel's solution (in the post linked in his answer above) is complete and includes very detailed instructions on how to deploy the CLR solution to the SQL Server, all you really need is just a basic understanding of the syntax. Read through the article, follow the code along with instructions and you should be able to deploy his solution in the relatively short period of time. SQL Server installation out of the box includes so called BIDS (Business Intelligence Development Studio), which you can use as an IDE. Open it, start from creating a new SQL-CLR project and go from there. If you already have the the full blown version of Visual Studio then the CLR project template is not going to be in BIDS, but will be under Database Projects folder in VS. If you cannot find BIDS on the box where SQL Server is installed then you can re-run the SQL Server installation from the disk and opt to include the components. This will install BIDS.
Jun 20, 2012 at 03:58 PM Oleg
(comments are locked)
10|1200 characters needed characters left

I agree with Pavel with a caveat. If you need to do this inside the database engine then CLR is by far the best way to go, and his post that he linked is an excellent reference. Phil Factor also has a good post on this topic at: [CLR Assembly RegEx Functions for SQL Server by Example][2]

However, I think you would generally be better off doing it outside of the databse, in the application or presentation layers. You can use SQL's limited pattern matching with LIKE to determine what should be sent to the application or presentation layers, and then use RegEx there.

If you truly must do this in T-SQL, you certainly can, but I can't think of a way to do it in one query. I would use an initial select with a LIKE conditional in the where clause to determine which rows needed to be processed and then put them in a table variable or temp table. Then I would use nested loops to process each row and extract each match within the row. It wouldn't be particularly hard, but it would be awkward and ineffecient compared to using RegEx either in CLR or in your application layer.

[2]: http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/
more ▼

answered Jun 20, 2012 at 04:45 PM

TimothyAWiseman gravatar image

TimothyAWiseman
15.5k 20 23 32

(comments are locked)
10|1200 characters needed characters left

Just for fun (and yes just because @TimothyAWiseman said he couldn't think of a way of doing this in one query) here's a single query, pure t-sql solution. Not saying it will be fast though!

;with tally as
(select top 10000
    ROW_NUMBER() over(order by sv1.number) N
from
    master.dbo.spt_values sv1,
    master.dbo.spt_values sv2
),

AlteredMsgs as 
(select row_number()over(order by msg) as ID, 
   cast(' '+msg+ ' ' as char(1002)) as Msg
from MessageDetail)
,

stringstofind (pattern) as 
(select 'xyz' union select 'abc')


select
 substring(msg,N+1, charindex(' ',msg,N+1)-N-1)
from AlteredMsgs
join tally on N> len(msg)
join stringstofind
  on substring(msg,N+1, charindex(' ',msg,N+1)-N-1) like stringstofind.pattern+'%'
and substring(msg,N,1) = ' '
order by ID
[please note the code formatter is currently encoding '>' as > ]
more ▼

answered Jun 21, 2012 at 10:24 AM

Kev Riley gravatar image

Kev Riley ♦♦
53.3k 47 49 76

Ok, I'm impressed.
Jun 22, 2012 at 03:48 PM TimothyAWiseman
Ha Ha - Thank you @TimothyAWiseman!
Jun 22, 2012 at 04:00 PM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left
Thanks Pavan, Oleg, TimothyAwiseman for suggestion. I am planning to use loop to check if there is multiple occurance of my patter.
more ▼

answered Jun 20, 2012 at 04:48 PM

munabhai gravatar image

munabhai
0 4 4 4

(comments are locked)
10|1200 characters needed characters left
Kev you are awesome
more ▼

answered Jun 21, 2012 at 01:53 PM

munabhai gravatar image

munabhai
0 4 4 4

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x12
x7
x4

asked: Jun 20, 2012 at 12:54 PM

Seen: 2016 times

Last Updated: Jun 22, 2012 at 04:00 PM