|
Hi, I'm a relative newcomer to SQL so apologies if this is stupid or obvious. I'm trying to write a query but am a bit stumped about the execution performance blowing up at what seems to me a pretty innocuous point. Rather than dump the script code out I'll lay it out in generics; I'm happy to post the exact script but I suspect it's a little long and won't really add anything to my question:
The tenth CTE is doing a LEFT JOIN between a preceding CTE with c. 1800 rows and 2 columns and another preceding CTE with c. 1500 rows and 3 columns. These two CTEs take about 25 secs and 45 secs to calculate respectively, the bulk of which time is inserting into the TVP each time. However when I then come to LEFT JOIN these two performance always blows up. I've cancelled the query each time it got over a couple of minutes before but I've let it run this time whilst I look on the internet. The query is still executing at 54 minutes and counting! So to my question: have you ever heard of this sort of thing? If so what could be causing it? How do I debug? And the million dollar question, how do I fix it? Any replies gratefully received. N.B. I'm running an actual execution plan on the query but can't see anything yet as the query hasn't finished executing yet. I'm SQL Server 2008 if that makes any difference. Cheers, Maccas XML Plan attached: XMLPlan.zip
(comments are locked)
|
|
You can post here at least Estimated execution plan which you can receive even without executing the query as @Kev Riley mentioned The issue with TVP and table variables in general is, that there are no statistics generated for them in SQL server and as a result, query optimizer always evaluates the table variables as if they have single record. So if you have for example a few thousands of records in the table variable and joining it with multi-bilion table, it can be a real performance killer if SQL server chooses wrong query plan because of missing statistics. In this case it could be better use regular temp table for which statistics are generated and use this in join. This is one of possible scenarios. But better would be to see a real plan (at least the estimated one). After looking on the query plan XMLPlan.zip, it's really crazy and better to re-work the query. Eg. at least split it and put intermediate result into some temp table and then continue. As it is written now, the JOINS to big tables are replicated by the subsequent CTEs. What more as I have mentioned the missing statistics on the Table Variable are also big problem in this situation, so I suggest to replace it by regular temp table at least too.
Feb 05 '12 at 08:35 PM
Pavel Pawlowski
(comments are locked)
|
|
First of all there's no such thing as stupid questions! Don't think of the 10 CTEs as serial execution, i.e. cte1 runs, processes into cte2 etc.. the CTEs are merely definitions of queries (or result sets, if you want to think of them that way) that are only invoked at execution time. The optimizer can and will flatten the entire query and process it in the best way it can find. Having 10 does seem a little over the norm, and it could be that you have overcomplicated the query so much that the optimizer does not have the time to find the most optimal query plan. I would prefer to see the code, as while you have done a good job in describing it, nothing beats seeing the code! Plus we would also need DDL for the underlying tables, example data and expected results. You could also get, and post (as an XML file), the estimated execution plan. (Ctrl-L in SSMS)
(comments are locked)
|
|
Oh my GOD. @maccas Seems like while trying to simplify the logic by breaking into pieces (With use of CTEs), you have over-complicated the query. There is redundant work done in the script For e.g.
could be replaced by one CTE i.e. CurrToPrevMAP1 itself As far as the rest of the logic is concerned, I think what you need is that PBLeaseID should be NULL in case the there are mutliple residuals i.e. COUNT(LeaseID) > 1. So your FINAL CTES should be (Since, I do not have the sample data and desired output, this is just the hint to make you optimize your query and omit the redundant work)
I hope this will help your cause. I did say I was a newcomer! Usman your suggestion is not quite right because:
I'm happy to hear other suggestions on how to shorten the query, whilst maintaining the same functionality Regarding Temp Tables this has massively improved the query performance. I put in a temp tables instead of @AOData and also to replace CurrBudgetData, PrevBudgetData, CurrToPrevMAP4 & CurrToPrevMAP. It's a bit hard to say exactly how much as it feels like caching is going on but the query time has come back down to under a minute, which is good enough for my current purposes.
Feb 06 '12 at 08:57 AM
maccas
Glad to hear that. But I still think that there could be more room for improvement. Temp tables are my preference as well but with only few thousand rows, the DECLARED tables should not be the main problem. In future, to get much faster and better help please do add table DDLs, some sample data and desired output in addition to what you have already attached. If this would have been the case in this question, you may have got a more optimized solution from this wonderful community :)
Feb 06 '12 at 09:22 AM
Usman Butt
(comments are locked)
|
|
That is one seriously hairy execution plan. I'm not going to try to pick it all apart, but it looks like you're falling into a classic hole. You're trying to use row by row logic on TSQL, which is a set-based processing system. Sometimes you do need to break data down into steps as you've done, but most of the time, this is an indication that you have some flaws in the data structure which need fixing or that you haven't tried enough using the set-based methods within TSQL. My suggestion, take a step back and reassess what data you need to get out of this query and then start exploring using a single statement to eliminate it. For example, just that first step after you load the table valued function, deleting data. That means you removed information that you just loaded. You made the assumption that joining to a table would be slower. It might be, but, with propery indexes and good code, joins are probably going to be substantially faster, not slower. Explore the rest of the code thinking along those lines.
(comments are locked)
|


Wow thanks for replying so quickly on a Sunday night.
I do get that CTEs are evaluated & collapsed at run-time. I think I described them that way as the way I'm building up this query I'm using each one in a chain to move my manipulation along. Obviously if you know a better way to be doing this I'd be delighted to hear / learn.
The code is attached below. I'm not sure you need much example data as the TVP defines the shape of the data I'm working with. AccountOutput is the monster table with a bit over 2 billion rows. Tips on on reducing the steps involved obvious welcomed but this is above and beyond the sort of help I was expecting.
I use NOLOCKS as this query can hit a production database database during work hours, I don't care about dirty reads and cannot afford to lock up the DB for any length of time.
-- ################################################### -- Declare and set variables -- ###################################################
DECLARE @CurrBudScen INT; DECLARE @PrevBudScen INT; DECLARE @OutputMsg VARCHAR(200); DECLARE @AOData TABLE (AccountID INT NOT NULL ,LeaseID INT NOT NULL ,ScenarioID INT NOT NULL ,PeriodID INT NOT NULL ,Value DECIMAL(12,2) NOT NULL);
SET @CurrBudScen = 6410; SET @PrevBudScen = 4757;
-- ################################################### -- Get Data from AccountOutput -- ###################################################
-- Current Budget -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SET @OutputMsg = CONVERT(VARCHAR(30), GETDATE(), 8) + ' - Get Data from AccountOutput: Current Budget' PRINT @OutputMsg
INSERT INTO @AOData
-- (3431 row(s) affected)
-- Previous Budget -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SET @OutputMsg = CONVERT(VARCHAR(30), GETDATE(), 8) + ' - Get Data from AccountOutput: Previous Budget' PRINT @OutputMsg
INSERT INTO @AOData
-- (3549 row(s) affected)
-- Remove Orphaned Data -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SET @OutputMsg = CONVERT(VARCHAR(30), GETDATE(), 8) + ' - Get Data from AccountOutput: Remove Orphaned Data' PRINT @OutputMsg
DELETE aod
FROM @AOData aod
WHERE lsd.RecordStatus = 316 OR asd.RecordStatus = 316 OR ssd.RecordStatus = 316 OR lsd.RecordStatus IS NULL OR asd.RecordStatus IS NULL OR ssd.RecordStatus IS NULL;
-- (498 row(s) affected)
-- ################################################### -- CTEs -- ###################################################
-- Move new rent on review and previous rent on review -- into columns on th same rwo for current budget -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
WITH CurrBudgetData AS
-- Move new rent on review and previous rent on review -- into columns on th same rwo for previous budget -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
,PrevBudgetData AS
-- Create map between current budget and previous budget -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Match rent reviews in same month under same SiteID
,CurrToPrevMAP1 AS
-- Pick from first list those matches with same LeaseID
,CurrToPrevMAP2 AS
-- Pick from first list those matches with different LeaseID -- that also don't have a same leaseID match (this will be demmed -- to over-rule the non-matching LeaseID initial match)
,CurrToPrevMAP3 AS
-- Aggregate two sub-selections above -- i.e. remove non-matching LeaseID initial matches where same -- LeaseID matches also found under the same site
,CurrToPrevMAP4 AS
-- Get list of all insatnces of unique CBLeaseID & PeriodID combos
,CurrToPrevMAP5 AS
-- Use above unique list to NULL out any residual mutliple -- matches on the grounds that these are too difficult -- to programatically match
,CurrToPrevMAP6 AS
-- Get distinct Lease, PeriodID list from current budget
,CurrToPrevMAP7 AS
-- Final Map!!! -- Left Join unique list from current budget to mapped IDs for -- final map
,CurrToPrevMAP AS
-- ################################################### -- Main Query -- ###################################################
-- takes 45 seconds SELECT * FROM CurrToPrevMAP6;
-- takes 25 seconds SELECT * FROM CurrToPrevMAP7;
-- takes over two hours and counting? SELECT * FROM CurrToPrevMAP;
-- Reset default LOCK behaviour SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
The line breaks in that script got all messed up. I just copied and pasted from SSMS, is there something else I should be doing?
I tried attaching the XML actual execution plan from the query, which finally finished running in an epic 2 hours 34 mins, but it exceeds the attachment file size limit. I'll try uploading the xml file elsewhere on the internet and posting a link.
Maccas
Here's the execution plan: http://www.macadie.co.uk/files/ExecutionPlan.xml