question

TSG avatar image
TSG asked

Function for splitting sentence

I am new to MS Sql. I am trying to find a function which will split a long sentence into multiple sentences of fixed length...

For example

'This is a long sentence and I want to split it into multiple sentences of 25 character each with out splitting a word into multiple lines'

needs to be split as

'This is a long sentence' 'and I want to split it' 'into multiple ' 'sentences of 25' 'character each with ' 'out splitting a word ' 'into multiple lines'

Any help on this is greatly appreciated.

t-sqlsplit
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

graz avatar image graz ♦ commented ·
TSG, if this answers your question, please click on the "check mark" to mark it as answered.
0 Likes 0 ·
Nathan Skerl avatar image
Nathan Skerl answered

Heres my first take, though Id like to see this done using a set based approach. Ill keep at it.

Interesting problem!

declare @Txt    varchar(max),            
    	@Limit	int            
            
set @Txt = 'This is a long sentence and I want to split it into multiple sentences of 25 character each with out splitting a word into multiple lines'            
set @Limit = 25            
            
;with cte_DoIt            
as  ( select  substring(@Txt, 0, ((@Limit+1)-charindex(' ', reverse(left(@Txt + ' ',@Limit))))) [Chunk],            
    			stuff(@Txt, 1, ((@Limit+1)-charindex(' ', reverse(left(@Txt + ' ',@Limit)))), '') [Swap]            
    	union all            
    	select	substring(Swap, 0, (@Limit+1) - charindex(' ', reverse(left(Swap + ' ', @Limit)))),            
    			stuff(Swap, 1, ((@Limit+1)-charindex(' ',reverse(left(Swap + ' ',@Limit)))), '')            
    	from	cte_DoIt            
    	where	len(Swap) > 0            
    )            
select Chunk, len(Chunk) [length]             
from cte_DoIt            
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

richard7za avatar image richard7za commented ·
Thanks Nathan for this gem! I modified it slightly for what I needed it for, have posted my version below.
0 Likes 0 ·
WBenhart avatar image
WBenhart answered
I know this is a really old post, but I was looking for the same kind of thing and couldn't find any function to do it. I'm really new to SQL and not really a programmer but I was able to create this function to return what I wanted. CREATE FUNCTION [dbo].[TextSplit] (@Sentence VarChar(MAX), @Pos1 AS INT, @Len AS INT) RETURNS VarChar(MAX) AS BEGIN -- Created by Wayne Benhart on September 9, 2013 -- Benhart Solutions v1.0' -- Wayne Benhart' -- WBenhart@BenhartSolutions.com' -- www.Pentagon2000Forum.com' -- -- USAGE: dbo.TextSplit( , , ) -- EXAMPLE: -- DECLARE @Str AS VARCHAR(MAX) -- SET @Str = 'This is my test of a segmented sentence.' -- SELECT dbo.TextSplit(@Str, 0, 12) AS [Line One], -- dbo.TextSplit(@Str, 13, 12) AS [Line Two], -- dbo.TextSplit(@Str, 25, 12) AS [Line Three] -- -- RESULTS: -- Line One Line Two Line Three -- This is my test of a segmented sentence. -- -- NOTE: This does not return exactly 12 characters, -- it will return the complete last word found -- and then return. -- Line One: Has 15 Characters -- Line Two: Has 14 Characters -- Line Three: Has 9 Characters -- DECLARE @Result AS VARCHAR(MAX) DECLARE @ST AS INT DECLARE @Words VARCHAR(MAX) DECLARE @tmpWord VARCHAR(MAX) DECLARE @t VARCHAR(MAX) DECLARE @I INT SET @Words = '' SET @Result = '' SET @I = 0 SET @ST = 0 SET @t = '' SET @tmpWord = '' -- Clean up sentence and remove any special characters found in it. -- This creates one long sustenance without the return characters. WHILE @I <= Len(@Sentence) BEGIN IF (SubString(@Sentence, @I, 1) in ('a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ' ', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '-', '_', '+', '=', '~', '`', ',', '.', '<', '>', '/', '?', '"', '[', ']', '{', '}', '\', '|', CHAR(13), CHAR(10), CHAR(39))) SET @Words = @Words + (CASE WHEN (SubString(@Sentence, @I, 1) = CHAR(10)) OR (SubString(@Sentence, @I, 1) = CHAR(13)) THEN ' ' ELSE SubString(@Sentence, @I, 1) END) SET @I = @I + 1 END SET @I = 0 -- This breaks down the sustenance into words and adds them -- back together based on the position you passed and the -- Segment length. It will include the punctuation as part of the -- word if there's no space between the word and the punctuation. -- Basically it's space delimited. -- -- PLEASE NOTE: If the last word in the sentence segment -- is greater that the length specified, it -- will include it! WHILE(@I < LEN(@Words)+1) BEGIN SELECT @t = SUBSTRING(@words,@I,1) IF(@t != ' ') BEGIN SET @tmpWord = @tmpWord + @t END ELSE BEGIN IF @tmpWord IS NOT NULL SET @Result = @Result + @tmpWord + ' ' IF @I >= @Pos1 BEGIN SET @Result = (CASE WHEN @ST = 0 THEN '' ELSE @Result END) SET @ST = 1 END IF LEN(@Result) >= @Len AND @I >= @Pos1 SET @I = 999999999 SET @tmpWord = '' END SET @I = @I + 1 SET @t = '' END IF @tmpWord IS NOT NULL AND @I < 999999999 AND @I >= @Pos1 SET @Result = @Result + @tmpWord IF @ST = 0 SET @Result = '' RETURN(@Result) END
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

richard7za avatar image
richard7za answered
I'm reviving this as I found Nathan Skerl's script extremely helpful, thanks Nathan! I modified it into an UDF and made a bit more flexible for what I needed it for: CREATE FUNCTION [dbo].[fnSplitStringIntoChunks] ( @Txt nvarchar(max),@Delimitator char(1), @Limit int, @ChunkToReturn int) RETURNS @SplitChunks table ( txtChunk nvarchar(max) ) AS --------------------fnSplitStringIntoChunks ---------------------------------- --(adapted from CTE code by Nathan Skerl: http://ask.sqlservercentral.com/questions/23224/function-for-splitting-sentence.html) --Splits a string into chunks of the maximum specified length without splitting up words ie. will only split at the specified delimitator. --Usage: fnSplitStringIntoChunks(StringToSplit, Delimitator, MaxChunkLength, ChunkToReturn) --Set ChunkToReturn to = 0 to return all chunks, or use it to specify which chunk to return. --Examples: --Select * from dbo.fnSplitStringIntoChunks('This is a long sentence and I want to split it into multiple sentences of 50 characters each with out splitting a word into multiple lines',' ', 50, 0) --Result: --------------------------------------------------- --This is a long sentence and I want to split it --into multiple sentences of 50 characters each --with out splitting a word into multiple lines --------------------------------------------------- --Select * from dbo.fnSplitStringIntoChunks('This is a long sentence and I want to split it into multiple sentences of 50 characters each with out splitting a word into multiple lines',' ', 50, 2) --Result: --------------------------------------------------- --into multiple sentences of 50 characters each --------------------------------------------------- --Select * from dbo.fnSplitStringIntoChunks('This is a long sentence| and I want to| split it| into multiple| sentences of 50 characters each with| out splitting a word| into multiple lines','|', 50, 0) --Result: --------------------------------------------------- --This is a long sentence| and I want to| split it -- into multiple -- sentences of 50 characters each with -- out splitting a word| into multiple lines --------------------------------------------------- --Example usage within another select: ----- Select LongStreetAddressToSplit, addressLine1.txtChunk, addressLine2.txtChunk, addressLine3.txtChunk from tmpAddresses ----- OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ',50, 1) as addressLine1 ----- OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ',50, 2) as addressLine2 ----- OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ',50, 3) as addressLine3 --(In this example we expect the majority of LongStreetAddressToSplit values to split nicely into 3 chunks (by looking at a sample of the data), -- but it would be prudent to check if any Addresses had to be split into a 4th chunk:) ----- Select LongStreetAddressToSplit, addressLine4.txtChunk from tmpAddresses ----- OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ', 50, 4) as addressLine4 ----- where addressLine1.txtChunk is not null ----------------------------------------------------------------------------- BEGIN DECLARE @chunkTable table (txtChunk nvarchar(max), chunkID int IDENTITY(1,1)) ;with cte_DoIt as ( select substring(@Txt, 0, ((@Limit+1)-charindex(@Delimitator, reverse(left(@Txt + @Delimitator,@Limit))))) [Chunk], stuff(@Txt, 1, ((@Limit+1)-charindex(@Delimitator, reverse(left(@Txt + @Delimitator,@Limit)))), '') [Swap] union all select substring(Swap, 0, (@Limit+1) - charindex(@Delimitator, reverse(left(Swap + @Delimitator, @Limit)))), stuff(Swap, 1, ((@Limit+1)-charindex(@Delimitator,reverse(left(Swap + @Delimitator,@Limit)))), '') from cte_DoIt where len(Swap) > 0 ) Insert @chunkTable select Chunk from cte_DoIt Insert @SplitChunks select txtChunk from @chunkTable where 1=1 and chunkID = Case @ChunkToReturn When 0 Then chunkID Else @ChunkToReturn End RETURN END; GO
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
If you insist on having this as a function, you will get better performance from converting this to an inline table-valued function, rather than having it as a multi-statement table-valued function. create FUNCTION [dbo].[fnSplitStringIntoChunks_ITVF] ( @Txt nvarchar(max),@Delimitator char(1), @Limit int, @ChunkToReturn int) RETURNS table as return ( with cte_DoIt as ( select substring(@Txt, 0, ((@Limit+1)-charindex(@Delimitator, reverse(left(@Txt + @Delimitator,@Limit))))) [Chunk], stuff(@Txt, 1, ((@Limit+1)-charindex(@Delimitator, reverse(left(@Txt + @Delimitator,@Limit)))), '') [Swap], 1 as chunkID union all select substring(Swap, 0, (@Limit+1) - charindex(@Delimitator, reverse(left(Swap + @Delimitator, @Limit)))), stuff(Swap, 1, ((@Limit+1)-charindex(@Delimitator,reverse(left(Swap + @Delimitator,@Limit)))), ''), chunkID+1 from cte_DoIt where len(Swap) > 0 ) select Chunk as txtChunk from cte_DoIt where chunkID = Case @ChunkToReturn When 0 Then chunkID Else @ChunkToReturn End ) GO
0 Likes 0 ·
richard7za avatar image richard7za commented ·
Thanks Kev Riley, it is much faster now! What would a better approach be than making it a function, so that I can use it within another select to grab a specific chunk? Eg: Select LongStreetAddressToSplit, addressLine1.txtChunk, addressLine2.txtChunk, addressLine3.txtChunk from tmpAddresses OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ',50, 1) as addressLine1 OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ',50, 2) as addressLine2 OUTER APPLY dbo.fnSplitStringIntoChunks(LongStreetAddressToSplit, ' ',50, 3) as addressLine3
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.