question

munabhai avatar image
munabhai asked

Extracting certain pattern of string from rows

Create table MessageDetail ( Msg varchar(1000) ) End space,?,/>, ", end of line Insert into MessageDetail Values ('abc1 just'), ('This abc5? is abc2 just testing abc3 infomation'), ('This is just xyz1 testing infomation abc234/> test'), ('This is just xyz34 testing abc231?3 infomation'), ('This is just testing xyz67/> infomation abc32? test'), ('This xyz34 is xyz5g55 just xyz5 infomation'), ('This is just testing infomation xyz'), ('This is xyz56f3 just testing abc') select * from MessageDetail Where PATINDEX('%abc%', Msg)> 0 Or PATINDEX('%xyz%', Msg)> 0 on above table, I need to extract only string from those rows of pattern which start from abc or xyz Those patter may be 1 or more time. If more then one then have to retrive rest of those pattern too. Start with same pattern as in where clause but end of that links are some time ", some time space, some time /> and some time end of line.
loopingcharindexpatindex
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Usman Butt avatar image Usman Butt commented ·
Aghh...Cant format it correctly :(
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
@munabhai Can you please post the desired output as well for the test data? This would help others to better understand your requirement. Thanks.
0 Likes 0 ·
Pavel Pawlowski avatar image
Pavel Pawlowski answered
If you are running SQL Server 2005 and above, then the easies way will be CLR Implementation of RegEx. You can take a look on my post [SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching][1]. There is also a sample solution available for download and this is easily solvable by RegEx expression. [1]: http://www.pawlowski.cz/2010/09/sql-server-2005-and-sql-server-2008-regular-expressions-and-pattern-matching-2/
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Usman Butt avatar image Usman Butt commented ·
+1. Cannot agree more. Best solution in such scenarios is to use a CLR function.
0 Likes 0 ·
munabhai avatar image munabhai commented ·
to use CLR I need to know VB and C# too. Or I can just use ms-sql?
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@munabhai Yes, you need to know VB and/or C#. However, in this case, because @Pavel's solution (in the post linked in his answer above) is complete and includes very detailed instructions on how to deploy the CLR solution to the SQL Server, all you really need is just a basic understanding of the syntax. Read through the article, follow the code along with instructions and you should be able to deploy his solution in the relatively short period of time. SQL Server installation out of the box includes so called BIDS (Business Intelligence Development Studio), which you can use as an IDE. Open it, start from creating a new SQL-CLR project and go from there. If you already have the the full blown version of Visual Studio then the CLR project template is not going to be in BIDS, but will be under Database Projects folder in VS. If you cannot find BIDS on the box where SQL Server is installed then you can re-run the SQL Server installation from the disk and opt to include the **components**. This will install BIDS.
0 Likes 0 ·
TimothyAWiseman avatar image
TimothyAWiseman answered
I agree with Pavel with a caveat. If you need to do this ***inside the database*** engine then CLR is by far the best way to go, and [his post][1] that he linked is an excellent reference. Phil Factor also has a good post on this topic at: [CLR Assembly RegEx Functions for SQL Server by Example][2] However, I think you would generally be better off doing it outside of the databse, in the application or presentation layers. You can use SQL's limited pattern matching with `LIKE` to determine what should be sent to the application or presentation layers, and then use RegEx there. If you truly must do this in T-SQL, you certainly can, but I can't think of a way to do it in one query. I would use an initial select with a `LIKE` conditional in the `where` clause to determine which rows needed to be processed and then put them in a table variable or temp table. Then I would use nested loops to process each row and extract each match within the row. It wouldn't be particularly hard, but it would be awkward and ineffecient compared to using RegEx either in CLR or in your application layer. [1]: http://www.pawlowski.cz/2010/09/sql-server-2005-and-sql-server-2008-regular-expressions-and-pattern-matching-2/ [2]: http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

munabhai avatar image
munabhai answered
Thanks Pavan, Oleg, TimothyAwiseman for suggestion. I am planning to use loop to check if there is multiple occurance of my patter.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered
Just for fun (and yes just because @TimothyAWiseman said he couldn't think of a way of doing this in one query) here's a single query, pure t-sql solution. Not saying it will be fast though! ;with tally as (select top 10000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ), AlteredMsgs as (select row_number()over(order by msg) as ID, cast(' '+msg+ ' ' as char(1002)) as Msg from MessageDetail) , stringstofind (pattern) as (select 'xyz' union select 'abc') select substring(msg,N+1, charindex(' ',msg,N+1)-N-1) from AlteredMsgs join tally on N> len(msg) join stringstofind on substring(msg,N+1, charindex(' ',msg,N+1)-N-1) like stringstofind.pattern+'%' and substring(msg,N,1) = ' ' order by ID [please note the code formatter is **currently** encoding '>' as `>` ]
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

TimothyAWiseman avatar image TimothyAWiseman commented ·
Ok, I'm impressed.
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Ha Ha - Thank you @TimothyAWiseman!
0 Likes 0 ·
munabhai avatar image
munabhai answered
Kev you are awesome
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.