question

Usman Butt avatar image
Usman Butt asked

Has anyone tried variations on 8K Splitter?

Hi, I am in a situation, where I am using the famous [8K Splitter function][1] developed by well-known, well-regarded Mr. Jeff Moden. Often it is referred on this forum as well. So, I believe I must share my findings and seek valuable suggestions/corrections from this great community. To start with, the original function mentioned in the article has been further optimized and is called “[DelimitedSplit8K_T1]” (BTW, we are using it currently). It is available in the attachment section of the aforementioned link. But for ready reference I am posting it here CREATE FUNCTION [dbo].[DelimitedSplit8K_T1] --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL -- does away with 0 base CTE, and the OR condition in one go! SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter ), cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ; GO So far I have come up with some modified versions but 2 of them are performing better than the DelimitedSplit8K_T1. But those would need some testing and there could be more room for improvement. I am posting them here. I have used some comments to give the main idea and differences. /*===== USES BINARY COLLATION TO SPEED UP THE STRING COMPARISON=======*/ CREATE FUNCTION [dbo].[DelimitedSplit8K_T1_With_Binary_Collation] --===== Define I/O parameters (@pString VARCHAR(8000) , @pDelimiter VARCHAR(1)/*===USE OF VARCHAR TO AVOID IMPLICIT CONVERSION WHILE COMPARING====*/ ) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL -- does away with 0 base CTE, and the OR condition in one go! SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString ,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN ), cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString , l.N1, l.L1) FROM cteLen l ; GO /*====SO FAR SEEMS TO BE THE BEST PERFORMER===========================*/ /*===== USES BINARY COLLATION TO SPEED UP THE STRING COMPARISON=======*/ /*===== FIRST DELIMITED ITEM HANDLED SEPERATELY=======================*/ CREATE FUNCTION [dbo].[DelimitedSplit8K_T1_First_Item_Handling_Seperately] --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter VARCHAR(1)/*===MADE IT VARCHAR TO AVOID IMPLICIT CONVERSION WHILE COMPARING====*/ ) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) /*====FIRST ITEM HANDLED AT THE END======*/ --SELECT 1 UNION ALL -- does away with 0 base CTE, and the OR condition in one go! /*=======================================*/ SELECT N+1 FROM cteTally WHERE SUBSTRING(@pString ,N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN ), cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,N1),0) - N1,8000) FROM cteStart ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. /*====FIRST ITEM HANDLING===============*/ SELECT ItemNumber = 0, Item = SUBSTRING(@pString , 1, ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1, 8000)) UNION ALL SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY N1), Item = SUBSTRING(@pString , N1, L1) FROM cteLen ; GO Any suggestions,corrections,optimization would be highly appreciated. I am using the same performance tests available in the above said article. Thanks in advance. [1]: http://www.sqlservercentral.com/articles/Tally+Table/72993/
sql-serverdelimited-string8k-csv-splitter
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@Usman There are 5 pages worth of pretty intersting discussion on the main site [starting from here][1]. Jeff Moden's 8K splitter, my attempt on it with XML and Pavel Pawlowski's CLR solution are pretty thoroughly examined there and Pavel's CLR function is a clear winner. [1]: http://www.sqlservercentral.com/Forums/Topic943562-338-2.aspx
4 Likes 4 ·
Usman Butt avatar image Usman Butt commented ·
Many thanks Oleg for pointing to a very informative discussion. And sorry for the delay.
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
@Oleg I forgot to let know that CLR is not permissible, so had to stick with TSQL solution.
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
Since my version is faster and I have not find any bugs yet. Marking my answer as accepted. Last day for the bounty as well ;)
0 Likes 0 ·
Usman Butt avatar image
Usman Butt answered
We have managed to complete the 10000 rows, 1133 elements i.e. near to 8k characters performance test (described in the above-mentioned pointer) in 19 secs approximately. Which is quite a drastic improvement for us. The major change was the conversion of tally table column's data type from BIGINT to INT. Another change (which shows marginally increased performance) was to add pre-calculated column which have the value for N + 1. Following performance test includes the two versions we have come up with USE [tempdb] GO SET NOCOUNT ON; GO IF OBJECT_ID('dbo.CsvTest') IS NOT NULL DROP TABLE CsvTest GO DECLARE @NumberOfElements INT, @NumberOfRows INT /*======== PARAMETER VALUES ==============================*/ SELECT @NumberOfElements = 1133 , @NumberOfRows = 10000 /*========================================================*/ SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum, ( SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma ( --=== This builds CSV row with a leading comma SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10)) FROM sys.All_Columns t3 --Classic cross join pseudo-cursor CROSS JOIN sys.All_Columns t4 --can produce row sets up 121 million. WHERE t1.Object_ID t3.Object_ID --Without this line, all rows would be the same FOR XML PATH('') ) ,1,1,'') AS VARCHAR(8000)) ) AS Csv INTO CsvTest FROM sys.All_Columns t1 --Classic cross join pseudo-cursor CROSS JOIN sys.All_Columns t2 --can produce row sets up 16 million rows GO ALTER TABLE CsvTest ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100; GO IF OBJECT_ID('dbo.Tally','U') IS NOT NULL DROP TABLE dbo.Tally; --===== Create and populate the Tally table on the fly. -- This ISNULL function makes the column NOT NULL SELECT TOP 11000 IDENTITY(INT,1,1) AS N, ISNULL(ISNULL(CAST(ROW_NUMBER() OVER ( ORDER BY (SELECT NULL)) AS INT),0)+ 1, 0) N1 INTO dbo.Tally FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 ; GO --===== Add a CLUSTERED Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 ; --===== Update Stats UPDATE STATISTICS dbo.Tally WITH FULLSCAN ; GO --===================================================================================================================== -- Conditionally drop and recreate the Split8KTally_With_Column_N1 splitter function. --===================================================================================================================== IF OBJECT_ID(N'dbo.Split8KTally_With_Column_N1', N'IF') IS NOT NULL DROP FUNCTION dbo.Split8KTally_With_Column_N1; GO --===== Tally Table (Split8KTally_With_Column_N1 ITVF) =============================================================================== /*===== THIS FUNCTION USES COLUMN "N1" OF TALLY TABLE HAVING PRE-CALCULATED VALUES FOR N+1 ===========*/ CREATE FUNCTION dbo.Split8KTally_With_Column_N1 ( @pString VARCHAR(8000), @pDelimiter VARCHAR(1) ) RETURNS TABLE AS RETURN /*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/ SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER , ItemValue = SUBSTRING(@pString, 1, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString COLLATE Latin1_General_BIN), 0) - 1, 8000 )) UNION ALL -- CONCATENATE THE OUTPUTS /*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/ SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] ) , ItemValue = SUBSTRING(@pString, [N1], ISNULL(NULLIF(CHARINDEX(@pDelimiter , @pString COLLATE Latin1_General_BIN, [N1]), 0) - [N1], 8000) ) FROM [dbo].[Tally] WITH ( NOLOCK ) --NOLOCK HINT IS NOT NECESSARY WHERE [N] BETWEEN 1 AND LEN(@pString) AND SUBSTRING(@pString, [N], 1) = @pDelimiter GO --===================================================================================================================== -- Conditionally drop and recreate the Split8KTally_With_N_PLUS_1 splitter function. --===================================================================================================================== IF OBJECT_ID('dbo.Split8KTally_With_N_PLUS_1','IF') IS NOT NULL DROP FUNCTION [dbo].Split8KTally_With_N_PLUS_1 GO --===== Tally Table (Split8KTally_With_N_PLUS_1 ITVF) =============================================================================== /*===== THIS FUNCTION USES RUNTIME CALCULATION "N + 1" INSTEAD OF USING THE PRE-CALCULATED COLUMN "N1" OF THE TALLY TABLE===========*/ CREATE FUNCTION dbo.Split8KTally_With_N_PLUS_1 ( @pString VARCHAR(8000), @pDelimiter VARCHAR(1) ) RETURNS TABLE AS RETURN /*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/ SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER , ItemValue = SUBSTRING(@pString, 1, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString COLLATE Latin1_General_BIN), 0) - 1, 8000 )) UNION ALL -- CONCATENATE THE OUTPUTS /*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/ SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] ) , ItemValue = SUBSTRING(@pString, (N+1), ISNULL(NULLIF(CHARINDEX(@pDelimiter , @pString COLLATE Latin1_General_BIN, (N+1)), 0) - (N+1), 8000) ) FROM [dbo].[Tally] WITH ( NOLOCK )--NOLOCK HINT IS NOT NECESSARY WHERE [N] BETWEEN 1 AND LEN(@pString) AND SUBSTRING(@pString, [N], 1) = @pDelimiter GO --Tally Test PRINT '/*====== dbo.Split8KTally_With_N_PLUS_1 =================*/' DBCC FREEPROCCACHE WITH NO_INFOMSGS SET STATISTICS TIME ON DECLARE @ItemNumber BIGINT , @Item VARCHAR(8000) ; SELECT @ItemNumber = V.ItemNumber , @Item = V.ItemValue FROM dbo.CsvTest D CROSS APPLY dbo.Split8KTally_With_N_PLUS_1(D.Csv, ',') V SET STATISTICS TIME OFF PRINT '/*========================================================*/' PRINT CHAR(10) + CHAR(13) GO PRINT '/*====== dbo.Split8KTally_With_Column_N1 =================*/' DBCC FREEPROCCACHE WITH NO_INFOMSGS SET STATISTICS TIME ON DECLARE @ItemNumber BIGINT , @Item VARCHAR(8000) ; SELECT @ItemNumber = V.ItemNumber , @Item = V.ItemValue FROM dbo.CsvTest D CROSS APPLY dbo.Split8KTally_With_Column_N1(D.Csv, ',') V SET STATISTICS TIME OFF PRINT '/*========================================================*/' PRINT CHAR(10) + CHAR(13) GO
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
Let me be the first one to upvote it :)
1 Like 1 ·
Usman Butt avatar image Usman Butt commented ·
Since my version is faster and I have not find any bugs yet. Marking my answer as accepted ;)
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
@Oleg Thanks. I know you are quite busy these days (read it on another thread), but I will post the VARCHAR(MAX) testing results as well. Your xml solution and RCTE solution (surprisingly) worked really well. ;)
0 Likes 0 ·
Usman Butt avatar image
Usman Butt answered
As I said, we did some testing on VARCHAR(MAX) versions as well. As tally table solution does not behave well with wider strings. Two of them performed really well 1. Inline Recursive CTE solution (Split_RCTE, Tweaked a little bit for better performance) 2. TVF XML solution by Oleg (Split_XML_Solution_By_Oleg) But surprisingly, RCTE was winner on some occasions with good margin as compared to losing. If I do the same 10000 rows, 1133 elements with 125-150 characters width, RCTE beat the XML solution marginally. But if the numbers of elements are decreased to 500, the RCTE was almost twice as fast as XML. But I would not count out any of the solutions, as they could fit according the requirement and environment. For RCTE solution, I must say it is more resource intensive. The index spool and SORT operators in the execution plan clearly indicates that it would hit the memory, tempdb and processors hard. It also reflects that with more CPUs, memory and capacity planned tempdb, the solution is viable to a better performance as compared to other solutions. Having said that, I still may tilt towards the XML solution. To me, why the RCTE performs much better than the tally table solution (both used the Charindex and Substring), is because the tally table solution compare each character with the delimiter, and due to the Out of Row phenomenon, this does not scale well. Whereas, RCTE solution does it for only the required number of times. Moreover, till VARCHAR(8000), the SUBSTRING(@pString, N,1) = @pDelimiter is dealt as a predicate in addition to the Seek predicate. But for VARCHAR(MAX), this is divided into two steps i.e. A Seek predicate followed by a Filter predicate which decreases the performance quite a bit. Now the performance test for RCTE and XML solutions USE [tempdb] GO SET NOCOUNT ON; GO IF OBJECT_ID(N'dbo.iFunction', N'V') IS NOT NULL DROP VIEW iFunction GO SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO CREATE VIEW dbo.iFunction AS /********************************************************************************************************************** Purpose: This view is callable from UDF's which allows us to indirectly get a NEWID() within a function where we can't do such a thing directly in the function. This view also solves the same problem for GETDATE(). Usage: SELECT MyNewID FROM dbo.iFunction; --Returns a GUID SELECT MyDate FROM dbo.iFunction; --Returns a Date Revision History: Rev 00 - 06 Jun 2004 - Jeff Moden - Initial creation Rev 01 - 06 Mar 2011 - Jeff Moden - Formalize code. No logic changes. **********************************************************************************************************************/ SELECT MyNewID = NEWID(), MyDate = GETDATE(); GO IF OBJECT_ID(N'dbo.CsvTest', N'U') IS NOT NULL DROP TABLE CsvTest GO DECLARE @MaxElementWidth INT, @MinElementWidth INT, @NumberOfElements INT, @NumberOfRows INT /*======== PARAMETER VALUES ==============================*/ SELECT @MaxElementWidth = 150 , @MinElementWidth = 125 , @NumberOfElements = 500 , @NumberOfRows = 10000 /*========================================================*/ SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum, CSV = (--==== This creates each CSV SELECT CAST( STUFF( --=== STUFF get's rid of the leading comma ( --=== This builds CSV row with a leading comma SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row ',' + LEFT(--==== Builds random length variable within element width constraints LEFT(REPLICATE('1234567890',CEILING(@MaxElementWidth/10.0)), @MaxElementWidth), ABS(CHECKSUM((SELECT MyNewID FROM dbo.iFunction))) % (@MaxElementWidth - @MinElementWidth + 1) + @MinElementWidth ) FROM sys.All_Columns ac3 --Classic cross join pseudo-cursor CROSS JOIN sys.All_Columns ac4 --can produce row sets up 16 million. WHERE ac3.Object_ID ac1.Object_ID --Without this line, all rows would be the same. FOR XML PATH('') ) ,1,1,'') AS VARCHAR(MAX)) ) INTO CsvTest FROM sys.All_Columns ac1 --Classic cross join pseudo-cursor CROSS JOIN sys.All_Columns ac2 --can produce row sets up 16 million rows GO PRINT '/*====== dbo.CSVTest Population completed ================*/' PRINT CHAR(10) + CHAR(13) GO ALTER TABLE CsvTest ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100; GO IF OBJECT_ID(N'dbo.Split_RCTE', N'IF') IS NOT NULL DROP FUNCTION dbo.Split_RCTE GO SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO --Create Split_RCTE function VARCHAR(MAX) version CREATE FUNCTION dbo.Split_RCTE ( @pString VARCHAR(MAX) ,@pDelimiter VARCHAR(1) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH cteSplit AS ( SELECT StartPosition = 0 , EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter, @pString COLLATE Latin1_General_BIN)) UNION ALL SELECT StartPosition = EndPosition + 1 , EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter, @pString COLLATE Latin1_General_BIN, EndPosition + 1)) FROM cteSplit WHERE EndPosition > 0 ) SELECT [ItemNumber] = ROW_NUMBER() OVER ( ORDER BY StartPosition ) , SUBSTRING(@pString, StartPosition, CASE EndPosition WHEN 0 THEN CONVERT(INT, LEN(@pString)) + 1 ELSE EndPosition - StartPosition END) ItemValue FROM cteSplit GO IF OBJECT_ID(N'dbo.Split_XML_Solution_By_Oleg', N'TF') IS NOT NULL DROP FUNCTION dbo.Split_XML_Solution_By_Oleg GO --Create Split_XML function VARCHAR(MAX) version SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO CREATE FUNCTION dbo.Split_XML_Solution_By_Oleg ( @Parameter VARCHAR(MAX) ,@Delimiter VARCHAR(1) ) RETURNS @Result TABLE ( ItemNumber INT ,ItemValue VARCHAR(MAX) ) AS BEGIN DECLARE @XML XML ; SET @Parameter = ( SELECT @Parameter FOR XML PATH('') ) ; SELECT @XML = '' + REPLACE(@Parameter, @Delimiter, '') + '' ; INSERT INTO @Result ( ItemNumber ,ItemValue ) SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL) ) AS ItemNumber , Item.value('text()[1]', 'VARCHAR(MAX)') AS ItemValue FROM @XML.nodes('//r') R ( Item ) ; RETURN ; END ; GO PRINT '/*====== dbo.Split_RCTE ==================================*/' DBCC FREEPROCCACHE WITH NO_INFOMSGS SET STATISTICS TIME ON DECLARE @ItemNumber BIGINT , @Item VARCHAR(MAX) ; SELECT @ItemNumber = V.ItemNumber , @Item = V.ItemValue FROM dbo.CsvTest D CROSS APPLY dbo.Split_RCTE(D.Csv, ',') V OPTION ( MAXRECURSION 0 ) SET STATISTICS TIME OFF PRINT '/*========================================================*/' PRINT CHAR(10) + CHAR(13) GO PRINT '/*====== dbo.Split_XML ===================================*/' DBCC FREEPROCCACHE WITH NO_INFOMSGS SET STATISTICS TIME ON DECLARE @ItemNumber BIGINT , @Item VARCHAR(MAX) ; SELECT @ItemNumber = V.ItemNumber , @Item = V.ItemValue FROM dbo.CsvTest D CROSS APPLY dbo.Split_XML_Solution_By_Oleg(D.Csv, ',') V SET STATISTICS TIME OFF PRINT '/*========================================================*/' PRINT CHAR(10) + CHAR(13) GO As always, everyone's mileage may differ. Please note that testing was done on SQL Server 2005. For `SQL 2008`, the conversion to **INT** may not be needed in RCTE solution.
10 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@Usman Thank you for bringing my function back to life. @Pavel Pawlowski's CLR solution is probably the best for this specific problem but if wheeling out a CLR is not an option, it is nice to explore other alternatives. My mileage did vary on the box I use here at work (4 cores, 8 GB of RAM, Windows 7 64 bit). For some reason, the xml solution beat the RCTE, maybe my box is tired :) :
/*== dbo.Split_XML ==*/

 SQL Server Execution Times:
   CPU time = 118857 ms,  elapsed time = 120583 ms.
  
/*== dbo.Split_RCTE ==*/
  
 SQL Server Execution Times:
   CPU time = 134660 ms,  elapsed time = 138365 ms.
  
I also checked my inline TVF xml function, but really sucks, despite the fact that it is inline:
create function dbo.udf_SplitXmlInlineTvf
(
    @Parameter varchar(max), @Delimiter char(1)
)
returns table with schemabinding as

return
    select cast(row_number() over (order by 
        (select null)) as int) ItemNumber,
        R.Item.value('text()[1]', 'varchar(max)') ItemValue
    from (select cast(''+replace(@Parameter, 
        @Delimiter, '')+'' as xml)) X(N)
    cross apply N.nodes('//r') R(Item);
go
I guess the cross apply of the cross apply is not a good way to go.
1 Like 1 ·
Jeff Moden avatar image Jeff Moden commented ·
Nicely done. It would be nice to see what the results from your runs actually were because I had to stop the rCTE solution after more than 3 minutes. Also, the collation trick didn't help me at all because I use the default collation. It’s still a good idea, though. Last but not least, the real key to the performance improvement was switching from the Itzik-style cteTally to a real Tally Table. The DelimitedSplit8K-T1 function runs in the same amount of time as your variations when it's changed to a Tally Table. Don’t forget to add the length check in when you do. The CLR is still going to beat it but we're not doing bad for a T-SQL only solution.,Nicely done. It would be nice to see what the results from your runs actually were because I had to stop the rCTE solution after more than 3 minutes. Also, the collation trick didn't help me at all because I use the default collation. Last but not least, the real key to the spead improvement was switiching from the Itzik-style cteTally to a real Tally Table. The DelimitedSplit8K function runs in the same amout of time as your variations when it's changed to a Tally Table.
1 Like 1 ·
Oleg avatar image Oleg commented ·
@Usman Here is what I used: **OS**: Windows 7 SP1
**SQL Server**: 2008 R2 SP1, 10.50.2500.0
**Processor**: Intel Q9650 3 GHz, 4 cores, 4 logical processors
**RAM**: 8 GB, both memory and maxdop are set to "all it can eat" for SQL Server
1 Like 1 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
Just noticed a bug in `dbo.Split_XML_Solution_By_Oleg` : the parameter `@Delimiter` is declared but never used The line `SELECT @XML = '' + REPLACE(@Parameter, ',', '') + '' ;` should read `SELECT @XML = '' + REPLACE(@Parameter, @Delimiter, '') + '' ;`
1 Like 1 ·
Usman Butt avatar image Usman Butt commented ·
@Oleg I totally agree that CLR is the best option. Moreover, thanks for sharing the test results. As I said earlier, everyone's mileage may differ ;) But what version of SQL Server you tested upon? What MAXDOP setting you had?
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.