question

sqlrd222 avatar image
sqlrd222 asked

How do I extract data from a variable length delimited string after a certain 'phrase' appears?

I wish to extract certain pieces of information that appear 5 pieces after a certain bit of data appears in the delimited string. For example D|1|2|3|4|5|6|7|8|9|D|1|2|3|4|5|6|7|8|9.... etc etc (48 D's) I wish to basically write a query which pulls say the 5th value in the delimited string after every D so I need to say every time D is found, go 5 along (maybe 10 if the pipe counts as a value) and extract that information into a new column in a table, I wish for this information to be displayed along in rows rather than down in a column. NOTE: There will always be 48 bits of information to be pulled from each cell, no more or less.
stringvariablesplitdelimited-string
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered
Presume there's not always 9 chars after the 'D', otherwise this would be a simple list of substrings at fixed positions i.e. select substring(@YourString,10,1), select substring(@YourString,20,1), select substring(@YourString,30,1), .... You can use a tally table to split the string into individual characters, find the position of the 'D's and move forward 10 chars from that. Then pivot the results to get your columns. I've limited it here to 4 columns for brevity, but it gives you the idea declare @YourString char(960) set @YourString = 'D|1|2|3|4|X|6|7|8|9|D|1|2|3|4|Y|6|7|8|9|D|1|2|3|4|Z|6|7|8|9|D|1|2|3|4|5|6|7|8|9' ;with tally as ( select top 1000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) select [1],[2],[3],[4] from( select row_number() over (order by N) as rownum, substring(@YourString,N+10,1) as chardata from tally where N< = len(@YourString) and substring(@YourString,N,1) = 'D' ) p pivot (max (chardata) for rownum in ([1],[2],[3],[4]) ) as PVT gives us the result 1 2 3 4 ---- ---- ---- ---- X Y Z 5 ----- Edit: added example using a table as the source declare @YourTable table (ID int, YourString char(960)) insert into @YourTable select 1,'D|1|2|3|4|X|6|7|8|9|D|1|2|3|4|Y|6|7|8|9|D|1|2|3|4|Z|6|7|8|9|D|1|2|3|4|5|6|7|8|9' insert into @YourTable select 2,'D|1|2|3|4|A|6|7|8|9|D|1|2|3|4|B|6|7|8|9|D|1|2|3|4|C|6|7|8|9|D|1|2|3|4|E|6|7|8|9' insert into @YourTable select 3,'D|1|2|3|4|M|6|7|8|9|D|1|2|3|4|N|6|7|8|9|D|1|2|3|4|O|6|7|8|9|D|1|2|3|4|P|6|7|8|9' ;with tally as ( select top 1000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) select ID, [1],[2],[3],[4] from( select ID, row_number() over (partition by ID order by N) as rownum, substring(YourString,N+10,1) as chardata from tally join @YourTable on N<= len(YourString) and substring(YourString,N,1) = 'D' ) p pivot (max (chardata) for rownum in ([1],[2],[3],[4]) ) as PVT order by id ----- Edit: added example using OPs field names ;with tally as ( select top 1000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) select ID, [1],[2],[3],[4] from( select ID, row_number() over (partition by ID order by N) as rownum, substring(RAWCLOB,N+10,1) as chardata from tally join DATAFLOW_CLOB on N>= len(RAWCLOB) and substring(RAWCLOB,N,1) = 'D' ) p pivot (max (chardata) for rownum in ([1],[2],[3],[4]) ) as PVT order by id
15 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

sqlrd222 avatar image sqlrd222 commented ·
I'm not largely experience with SQL but I'm testing your code out and its erroring saying Line 20 'An expression of non-boolean type specified in context where condition is expected, near ';' - is there something I should change in my code?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Argh it's the T-SQL formatter chewing up my code on this site look for the bit that says **`N< =`** and change it to **'N <='**
0 Likes 0 ·
sqlrd222 avatar image sqlrd222 commented ·
It's saying line 20 which is 'where N< = len(@YourString)' - I can't see any '<' signs here?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
And now it's encoded your reply - I can't tell which way round you meant! change line 20 to `where N less_than_or_equal_to len(@YourString)' replacing 'less_than_or_equal_to' with the correct operator
0 Likes 0 ·
sqlrd222 avatar image sqlrd222 commented ·
Sorted - thank you so much :)
0 Likes 0 ·
Show more comments
sqlrd222 avatar image
sqlrd222 answered
Thank you, you're so helpful. I now have this code - declare @YourTable table (ID int, YourString char(960)) insert into @YourTable select 'DATAFLOW_CLOB.RAW_CLOB' ;with tally as ( select top 1000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) select ID, [1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36],[37],[38],[39],[40],[41],[42],[43],[44],[45],[46],[47],[48] from( select ID, row_number() over (partition by ID order by N) as rownum, substring(RAWCLOB,N+10,1) as chardata from tally join DATAFLOW_CLOB on N<= len(RAWCLOB) and substring(RAWCLOB,N,1) = 'D' ) p pivot (max (chardata) for rownum in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36],[37],[38],[39],[40],[41],[42],[43],[44],[45],[46],[47],[48]) ) as PVT order by id But I get an error saying column name or number does not match table definition on line 3 so I took line 3 out (did't know if I needed it) and then I get an error saying invalid object name (dataflow_clob) on line 6. Not sure what's wrong really.
11 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
No don't use any reference to @YourTable - that was just me trying to demonstrate how to run this against a table. Just use what I posted as the last example... Also you need to identify an identifer on the row (I've called it ID in the script)
0 Likes 0 ·
sqlrd222 avatar image sqlrd222 commented ·
OK I took the top bit out so the query just starts with ';with tally as...' but I'm still getting an error saying invalid object name 'DATAFLOW_CLOB' but on line 2? I can't see any reference to the table name on line 2. Is this not because I need to reference which database I am taking it from? The ID will be the date for each string of data when this bit gets working. The date is contained in the string, it is the 12th bit of information along.
0 Likes 0 ·
sqlrd222 avatar image sqlrd222 commented ·
I have solved that by saying 'USE GATEKEEPER' above the code you gave me. But it's now erroring saying that ID is an invalid column name, I've been trying to declare it but for some reason I'm not getting it right?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
No - the ID I'm referring to needs to be something that identifies the row, before you start to strip out the string. Are there any other fields you can use?
0 Likes 0 ·
sqlrd222 avatar image sqlrd222 commented ·
The first column in the table I'm using is called DC_INDEX_FX and is a type of unique ID for each string. Is that why the ID part of the code won't work for me? Thanks
0 Likes 0 ·
Show more comments
Kev Riley avatar image
Kev Riley answered
OK posted as new answer, as the clarification that the data is not just characters means a slightly different approach. Still using the tally table, but this time to split the string out using the '|' (pipe) character, and cope with 'bits of data' that are not just single characters. declare @YourTable table (ID int, YourString char(960)) insert into @YourTable select 1,'SPH|1.0|2ed|345|4|X|69|7dddess|8.0.98|9re|SPH|1.0|2ed|345|4|XX|69|7dddess|8.0.98|9re|SPH|1.0|2ed|345|4|XXX|69|7dddess|8.0.98|9re|SPH|1.0|2ed|345|4|XXXX|69|7dddess|8.0.98|9re|' ;with tally as ( select top 1000 ROW_NUMBER() over(order by sv1.number) N from master.dbo.spt_values sv1, master.dbo.spt_values sv2 ) , splitoutdata as ( select ID, row_number()over(partition by ID order by N) as rownum, substring('|'+YourString+'|',N+1,charindex('|','|'+YourString+'|',N+1)-N-1) as bitofdata from tally join @YourTable on N< len('|'+YourString+'|') and substring('|'+YourString+'|',N,1) = '|' ) select ID, [1],[2],[3],[4] from( select ID, row_number()over(order by rownum) as rownum, --this line here gets the 5th element, so if in future you want a different one --simply change the integer value (select bitofdata from splitoutdata so2 where so2.rownum = so1.rownum +5) as bitofdata from splitoutdata so1 where bitofdata = 'SPH' ) p pivot (max (bitofdata) for rownum in ([1],[2],[3],[4]) ) as PVT order by id
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

sqlrd222 avatar image sqlrd222 commented ·
Thank you for that, I apologise that I didn't clarify it in the first place. It still doesn't seem to like 'with tally as' as I'm getting an error for that line that says Invalid length parameter passed to the left or substring function - I assume the substring as there is no left function.
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Would it be possible to give me an example of the real data? Please message me (envelope icon at the top of the page) for an email address if you don't want to post here.
0 Likes 0 ·
sqlrd222 avatar image sqlrd222 commented ·
Did you receive my message?
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Yes, looking at it now.....
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
An update: the OP sent me a copy of the real data, and I arranged a solution around the query given above, substituting the OP's actual object names and data types
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.