Query Performance tremendous slow... 4 hours and still executing
Hello Techie, I have two tables with a highly denormalized nature, Master and child table. records In table: 40 million For a given specific line there is some valid footnote, and I need to check in child table whether a child table contain only those footnote for the line which is present in Master. If not than that record need to be in output **i tried to do like given below** but for small resultset this is giving output correctly, but for table have million of records even after 4 hours query is still executing. **may someone please share your expertise.or some other way to achieve this** Table and data DECLARE @MASTER TABLE ( Footnote VARCHAR (50), Line VARCHAR (50) ) INSERT @MASTER SELECT 'O031', 'EXHAUSTMUF' UNION ALL SELECT 'V049', 'CAMSHAFT' UNION ALL SELECT 'V049', 'DSHBRDCOVR' UNION ALL SELECT 'V049', 'EXHAUSTMUF' UNION ALL SELECT 'O040', 'EXHAUSTMUF' UNION ALL SELECT 'V133', 'DSHBRDCOVR' UNION ALL SELECT 'V133', 'EXHAUSTMUF' UNION ALL SELECT 'E014', 'CAMSHAFT' UNION ALL SELECT 'E014', 'EXHAUSTMUF' UNION ALL SELECT 'O062', 'EXHAUSTMUF' UNION ALL SELECT 'O194', 'EXHAUSTMUF' DECLARE @ChildTable TABLE ( line VARCHAR (50), footnote_list VARCHAR (50) ) INSERT @ChildTable select 'EXHAUSTMUF', 'O031, V049' UNION ALL select 'EXHAUSTMUF' ,'O040, V133' UNION ALL select 'EXHAUSTMUF', 'E014, O062, O194' UNION ALL select 'DSHBRDCOVR', 'USP2,LTS9' UNION ALL select 'DSHBRDCOVR', 'V049' Output **DSHBRDCOVR USP2,LTS9** I tried to do something like this. First added one function: create FUNCTION [dbo].[fnSplitString]( @string NVARCHAR(MAX), @delimiter CHAR(1), @keylist nvarchar(50)) RETURNS @output TABLE(splitdata NVARCHAR(MAX),keyName nvarchar(50)) BEGIN DECLARE @start INT, @end INT SELECT @start = 1, @end = CHARINDEX(@delimiter, @string) WHILE @start < LEN(@string) + 1 BEGIN IF @end = 0 SET @end = LEN(@string) + 1 INSERT INTO @output (splitdata,keyName) VALUES(ltrim(rtrim( SUBSTRING(@string, @start, @end - @start))),@keylist) SET @start = @end + 1 SET @end = CHARINDEX(@delimiter, @string, @start) END RETURN END --storing splitted values DECLARE @SplittedTable TABLE ( splitted VARCHAR (50), keys VARCHAR (50) ) --getting max rows declare @maxRows int =(SELECT count(*) FROM @childtable) --loop over each rows WHILE(@maxRows!=0) BEGIN DECLARE @footNotList nvarchar(max)=( SELECT abc.footnote_list FROM (SELECT ct.footnote_list, ROW_NUMBER() OVER (ORDER BY ct.footnote_list DESC) AS RowNumber FROM @ChildTable AS ct ) abc WHERE abc.RowNumber=@maxRows ) --inserting into splitted table INSERT INTO @SplittedTable SELECT * FROM dbo.fnSplitString(@footNotList,',',@footNotList) set @maxRows=@maxRows-1; END --selecting by keys and finding them into master SELECT * FROM @ChildTable ct WHERE ct.footnote_list IN ( SELECT DISTINCT st.keys FROM @SplittedTable AS st WHERE st.splitted NOT IN (SELECT Footnote FROM @MASTER m )) Thanks
EDIT Further shortened the script and got rid of storing intermediate results in the Splitted table. See bottom of this post END EDIT There are two things to improve performance in this: 1) Use CROSS APPLY to call the function. That way, you won't need to loop over the rows in your table, instead you let SQL Server internally do the looping for you, applying the function call to each row. That's a major performance improvement. 2) Rewrite the split function to use a Tally- or Numbers-table, and change it to an inline Table Valued Function (they perform sooo much better than multi statement table valued functions when applied to a dataset). This is what I have done with the function (there are much better implementations of a Tally-table split, search on SQL Server Central for Jeff Moden's implementation of DelimitedSplit implementations). CREATE FUNCTION [dbo].[fnSplitString]( @string NVARCHAR(MAX), @delimiter CHAR(1), @keylist nvarchar(50)) RETURNS TABLE AS RETURN( WITH CTE AS( SELECT 1 as n UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ),CTE2 AS( SELECT ROW_NUMBER() OVER(ORDER BY c.n) as n FROM CTE c CROSS JOIN CTE c2 CROSS JOIN CTE c3 CROSS JOIN CTE c4 CROSS JOIN CTE c5 CROSS JOIN CTE c6 ) SELECT TOP(LEN(@string)) splitdata = ltrim(rtrim( SUBSTRING(@string, n, CHARINDEX(@delimiter, @string + @delimiter, n)-n) )), keyname= @keylist FROM CTE2 WHERE n
I can't think of a much quicker way to do this with a table design that is so not designed for looking at this way. The only excuse for storing multiple values in one column like that is if that's the only way the applications need to see it and it avoids constantly stringing child rows together. It's forcing you to do a scalar function inside a cursor which is about as bad as it gets for performance. (OK, you could throw in some nested views to really kill it!) Is this a one-off query or will you have to do it regularly? If it's a one-off, they will just have to wait for it to run. If that badly affecta a production or reporting server, backup the database to a test server and run it there. If you have to do this regularly you should try to get this normalised. With that many rows it's never going to be a 2 second query but you should be able to get much better than 4+ hours.