question

Fayu avatar image
Fayu asked

RegEx help for finding values (substring)

I am new to this whole regex thing. I have read up on it but I still need some help. I have expressions shown below and I want to get the values. For example, in example 1 I want to search for my property ([[[MyProperty1]]]) and I want to get back 'Value'. In example 2, I want to be able to search for [[[MyProperty2]]] and have value2 as the result.

Cant you help me write the pattern as well as explain to me what is being done in the pattern to acheive this goal.

Thanks in advance.

Example 1: [[[MyProperty1]]] Value

Example 2: [[[MyProperty1]]] Value [[[MyProperty2]]] value2 [[[MyProperty3]]] value 3

regex
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

The regular expressions you want, assuming that 'value' is something that doesn't contain a space, are:

For MyProperty1: \[\[\[MyProperty1\]\]\] ([^ ]+)
For MyProperty2: \[\[\[MyProperty2\]\]\] ([^ ]+)
For MyProperty3: \[\[\[MyProperty3\]\]\] ([^ ]+)

These are in .NET format, so I'm not sure if the VBScript RegExp uses the same syntax - I would hope it would be similar. Those regular expressions will create a numbered capture group, and capture group 1 in each case will contain your 'value'.

I would highly recommend two things:

1) Use a CLR assembly which wraps the .NET Regular Expression libraries instead. That will leave you with faster, more elegant code.

2) Go and download Expresso - which is a free, and hugely useful tool. It will help you with the syntax of regular expressions hugely.

Hope that helps...

Edit -> Here is a sample of a CLR regular expression function:

public partial class RegexFunctions
{
    private struct _match
    {
        public readonly int MatchNumber;
        public readonly int GroupNumber;
        public readonly string CaptureValue;
        public _match(int matchNumber, int groupNumber, string captureValue)
        {
            MatchNumber = matchNumber;
            GroupNumber = groupNumber;
            CaptureValue = captureValue;
        }
    }

    [SqlFunction(FillRowMethodName="FillRow", TableDefinition="MatchNumber int, GroupNumber int, CaptureValue nvarchar(MAX)")]
    public static IEnumerable GetCaptureGroupValues(string input, string pattern)
    {
        List<_match> matchList = new List<_match>();
        int matchIndex = 0;
        foreach (Match m in Regex.Matches(input, pattern))        
        {
            int groupIndex = 0;
            foreach (Group g in m.Groups)
            {
                matchList.Add(new _match(matchIndex, groupIndex++, g.Value));
            }
            matchIndex++;
        }
        return matchList;
    }

    public static void FillRow(Object obj, out int MatchNumber, out int GroupNumber, out SqlString CaptureValue)
    {
        _match match = (_match)obj;
        MatchNumber = match.MatchNumber;
        GroupNumber = match.GroupNumber;
        CaptureValue = match.CaptureValue;
    }
}

And here is how you would use that:

SELECT * 
  FROM [dbo].[GetCaptureGroupValues]
      ('[[[MyProperty1]]] Value [[[MyProperty2]]] value2 [[[MyProperty3]]] value3', 
       '\[\[\[MyProperty[0-9]+\]\]\] ([^ ]+)')

Which returns:

MatchNumber     GroupNumber     CaptureValue
0               0               [[[MyProperty1]]] Value
0               1               Value
1               0               [[[MyProperty2]]] value2
1               1               value2
2               0               [[[MyProperty3]]] value3
2               1               value3
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
Oh, and it returns the whole thing because you're looking at capture group 0 - look at capture group 1 (not sure how you do that in the VBScript one though). From the code I've added above hopefully that should be clear.
1 Like 1 ·
TimothyAWiseman avatar image
TimothyAWiseman answered

You should realize that SQL Server does not natively support Regex. You can use RegEx within SQL Server by importing CLRs but that involves numerous steps and doing the actual regex coding in a language (normally C#) other than SQL.

With that said, to search for a string literal in regex that does not involve using any special control characters, the regex pattern is just that string literal.

Now, if what you want is to select all rows that have that value, you can do that in pure T-sql it would be:

select
   [your column list]
from
   [your table name]
where
   MyProperty1 like '%Value%'

In that, % represents a wildcard and this will return all rows where the string Value is anywhere in the MyProperty1 string. If you just want to find string literals in regex, you normally do not need a wildcard because of they way it works, but if you did want one for more complicated patterns the equivalent is the period (.) which will match any number of any characters.

Hopefully this helps, but I am not completly sure I understand your question, so please feel free to ask away or expand your question if this doesn't answer everything.

4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

TimothyAWiseman avatar image TimothyAWiseman commented ·
I use regex extensively, but not in VBScript. I use Python and C#. If I understand your question though, it sounds like the key thing to remember is the regex pattern for a string literal without control characters is that string literal itself (escape the control characters if present.) If you can clarify your question more I am happy to try to help, otherwise you may consider stackoverflow for a VBScript question.
1 Like 1 ·
Fayu avatar image Fayu commented ·
Thanks for your reply. I am using a VBScript.RegExp object to execute my regex expression (EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT). Now that I think about it, this may not be the best place to ask this question. However, if you (or anyone else) can provide me with an answer with an explanation, I would really appreciate it.
0 Likes 0 ·
Fayu avatar image Fayu commented ·
Timothy... I think I may have found the pattern thanks to Matt and yourself. I am facing other issues as well (stated above). If you can assist I would be glad. If not, thats okay. You have been kind in taking your time to assist me. I appreciate your help. THANK YOU!
0 Likes 0 ·
TimothyAWiseman avatar image TimothyAWiseman commented ·
I understood your question more clearly after reading Matt's answer. I fear I had misread it at first. As to your other questions, I am afraid I do not know much about VBScript and they seem to pertain mostly to VBScript.
0 Likes 0 ·
Jeff Moden avatar image
Jeff Moden answered

Heh... Regex and CLR's don't solve the real problem here. Until you normalize the data, you can't have a decent T-SQL solution and you'll always find it to be a pain to look anything up. At the very least, you'll be burning extra clock cycles with some form of LIKE.

Instead of all that, normalize the data, put the normalized into a table, and be done with it. Like this... (see the note further below for how to build the very useful Tally Table)

--===== Create and populate a test table with the given data.
     -- This may be a part of the solution.
 CREATE TABLE #YourTable 
        (
        SomeOwnerID INT IDENTITY(1,1), 
        PropertyString VARCHAR(8000)
        );

 INSERT INTO #YourTable
        (PropertyString)
 SELECT '[[[MyProperty1]]] 1000' UNION ALL
 SELECT '[[[MyProperty1]]] 2000 [[[MyProperty2]]] 3000 [[[MyProperty3]]] 4000';

--===== Split the data and put it into a "normalized" table
WITH 
cteSplit1 AS
(
 SELECT SomeOwnerID,
        SUBSTRING(yt.PropertyString, t.N, CHARINDEX('[[[',yt.PropertyString+'[[[',t.N+3)-(t.N)) AS Split1
   FROM dbo.Tally t
  CROSS JOIN #YourTable yt
  WHERE t.N < LEN(PropertyString)
    AND SUBSTRING(yt.PropertyString,t.N,3) = '[[['
)
 SELECT SomeOwnerID,
        SUBSTRING(Split1,1,CHARINDEX(']]]',Split1)+2) AS PropertyName,
        SUBSTRING(Split1, CHARINDEX(']]]',Split1)+3,8000) AS PropertyValue
   INTO #NormalizedTable
   FROM cteSplit1;

--===== Show the results stored in the new "normalized" table
 SELECT * FROM #NormalizedTable;

If you don't have a Tally Table, now's the time to build one. Every Jedi must build his own light saber.... see the following article for how to build a Tally Table and how it's used to replace some WHILE loops. http://www.sqlservercentral.com/articles/T-SQL/62867/

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.