x

Removing Costly SORT From PIVOT Query

Hi there,

I have a PIVOT query that executes quite slowly. Looking at the execution plan I can see that the SORT operand is costing 96% of the execution. Is there a way to rewrite the query to remove this sort and/or reduce its cost to speed up the execution?

 SELECT pvt.*
 FROM (SELECT FormID, SubID, CustID, ReportName, ColumnValue
         FROM Survey
 ) AS y
     INNER JOIN VW_CustDetails cd ON cd.CustID  = y.CustID
     INNER JOIN VW_SubDetails sd ON sd.SubID = y.SubID
     PIVOT (MAX(ColumnValue) FOR ReportName IN ([Column1]
                                                 )
 ) pvt;

TIA

@Oleg thanks. Sorry, ReportingName was a mistake, it should be ReportName. The data does return correctly however it is very slow. The query maybe badly written but the SORT is taking up the great majority of the cost of execution.

The Survey table contains the following rows and the views are below:

 -- Survey Table contains ~7,500,000 rows. Columns are:
 ReportID, SubID, FormID, CustID, ClientID, FormName, ColumnName, ColumnLabel, 
 ColumnDatatype, ColumnValue, Valid, ReportName, SupportingText

 CREATE VIEW VW_CustDetails    -- returns additional customer details not present in survey table
 AS    
 SELECT  
     crm.ID ,CR_ID, ClientID, g.CompanyCode  , g.CompanyName Company, 
     UPPER(Asset) Asset, dbo.CapitaliseFirstLetter(cr.CompanyName) CompanyName,   
     dbo.CapitaliseFirstLetter(Address1) Address1, dbo.CapitaliseFirstLetter(Address2) Address2,  
     dbo.CapitaliseFirstLetter(Address3) Address3, dbo.CapitaliseFirstLetter(City) City,    
     UPPER(cr.Postcode) Postcode, Phone, Fax , LOWER(WebAddress) WebAddress,  
     LOWER(EmailAddress) EmailAddress  
     FROM CRM.dbo.RC_CRM crm    
     LEFT JOIN CRM.dbo.CustRecord cr ON crm.CR_ID = cr.ID  
     LEFT JOIN Companies.dbo.Groups g ON crm.CompanyCode = g.CompanyCode;

 CREATE VIEW VW_SubDetails       -- returns additional user details not present in survey table
 AS                
 SELECT DISTINCT 
     SubID ,cr.FormID,OriginalFormID,fs.UserID,Submitted,        
     ISNULL(ud.Title,'') + ' ' + ISNULL(ud.FirstName,'') + ' ' + ISNULL(ud.LastName,'') Submitter     
     FROM Reports cr      
     LEFT JOIN Forms.dbo.FormSubmissions fs ON cr.SubID = fs.ID    
     LEFT JOIN  Core.dbo.UserDetails ud ON fs.UserID = ud.ID;

alt text

4406-sort.jpg (26.9 kB)
more ▼

asked Dec 01 at 02:23 PM in Default

avatar image

David 2 1
1.3k 56 61 69

@David 2 1 Does this query produce expected results? What is ReportingName? ReportName is included in the select list but I am not sure where ReportingName comes from. Also, why do you need joins to the (probably expensive) views when the data in Survey is already there? I have seen and used joins as the source which is then pivoted, or, alternatively, the source which is then pivoted which is then joined with something else, but I have never seen or used anything even remotely resembling the query like the one in your question because it looks very expensive even without knowing any details about the underlying data. Please post a small sample (couple of rows from survey and views), describe how the survey and views are "related", and expected results. Thank you.

Dec 01 at 03:45 PM Oleg

@David 2 1 What columns are needed to be present in the final result? If there is a need to include only one column value to be pivoted (ReportName = Column1 to be exact) then why do you need a pivot when using the where clause will do the trick? Most importantly, please add the list of columns in the final result to your answer (edit existing answer rather than creating a new one). I mean there has to be some other columns required outside of the listed FormID, SubID, and CustID. With these 3 only, you can state

 select
     s.FormID, s.SubID, s.CustID, max(ColumnValue) as Column1
     from Survey
     where ReportName = 'Column1'
     group by s.FormID, s.SubID, s.CustID; -- not sure about it is needed or not

If you need to add some other columns to the list then add the joins to Survey, hopefully not views but the actual tables if possible. The views are somewhat questionable. For example, why RC_CRM is left joined with CustRecord? There should be no way to have CRM record without customer master record, so why LEFT join? Similarly, there should be no CustRecord with invalid Groups ref, so why LEFT join between these two?

Dec 01 at 04:47 PM Oleg

@Oleg, Thanks again. The columns I need which will be reported on through SSRS are:

 FormName, SubID, CustID, ClientID, CompanyCode, Company,
 Asset, CompanyName, Address1, Address1, Address1, City,
 Postcode, Phone, Fax, WebAddress, EmailAddress, FormID,
 OriginalFormID, UserID, Submitted, Submitter,
 Column1 --  this is the column created from the pivot. 
 --There are many more I have only included Column1 
 --here to try to make the example easier.

I'm sure that I have not optimized the query correctly hence the LEFT joins as I've built it up ad-hoc. Do you think it can be rewritten better and this would remove the sort? Thanks. :)

Dec 01 at 05:38 PM David 2 1
(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

Perhaps I should clarify the necessity of pivoting data bound to be used as the source for SSRS report because all reporting tools do a pretty good job pivoting data on their end (placing the column on Columns area usually does the trick). Instead of trying to understand the reasons behind the necessity of pivoting data, I will instead try to restate the query in question in order to make it faster. From what I see, it looks like the reason for slowing it down is due to the fact that too much (too wide to be exact) info is used for pivoting whereas in reality only few columns need to participate, and the fluff may be added later. Here is the script (to the best of my ability to just type without any actual data available on my end):

 select
     pvt.FormName, pvt.SubID, pvt.CustID, cd.ClientID, cd.CompanyCode, cd.Company,
     cd.Asset, cd.CompanyName, cd.Address1, cd.Address2, cd.Address3, cd.City,
     cd.Postcode, cd.Phone, cd.Fax, cd.WebAddress, cd.EmailAddress,
     sd.FormID, sd.OriginalFormID, sd.UserID, sd.Submitted, sd.Submitter,
     pvt.Column1, pvt.Column2, pvt.Column3, pvt.Column4, pvt.Column5, pvt.etc
     from (
         select 
             FormID, FormName, SubID, CustID, ReportName, ColumnValue
             from Survey
     ) src
     pivot (max(ColumnValue) for ReportName in (
         [Column1], [Column2], [Column3], [Column4], [Column5], [etc])
     ) pvt
     inner join VW_CustDetails cd
         on cd.CustID  = pvt.CustID
     inner join VW_SubDetails sd
         on sd.SubID = pvt.SubID;

This way, the pivot involves nothing but the Survey table so the sort operation will be cheaper. Once the pivot is done, the results are then joined with others for details. I still retained the joins to the views instead of unwinding them to the actually needed tables. The optimizer is usually quite capable of expanding the guts of the view definitions into the plan. The only possible problem may be caused by the LEFT join present in the views when it may not have been needed, depending on data quality.

Hope this helps.

Oleg

more ▼

answered Dec 01 at 07:10 PM

avatar image

Oleg
19k 3 7 28

@Oleg, wow many thanks. You are literally a SQL guru. The query is now down from 20 mins to ~1 min. That is amazing. Just so I can understand in layman terms the problem I created was joining the views within the pivot?

6 days ago David 2 1

@David 2 1 Yes, the problem was that the data was first joined and then pivoted. This meant that 3 columns from Survey and ALL COLUMNS from both views were participating in the implicit group by. Before the data is grouped, it is sorted, and with that many columns the sort operation is expensive. All I did was I used just the needed Survey columns to pivot and then joined the results with the views as per your query in question.

The execution time of ~ 1 minute is still bothersome though because with only 8 million rows in the Survey table, the engine should eat the pivot for breakfast. One way to check whether the pivot or the joins are at fault would be to run just the pivot part, i.e.

 select
     pvt.FormName, pvt.SubID, pvt.CustID, 
     pvt.Column1, pvt.Column2, pvt.Column3, pvt.Column4, pvt.Column5, pvt.etc
     from (
         select 
             FormID, FormName, SubID, CustID, ReportName, ColumnValue
             from Survey
     ) src
     pivot (max(ColumnValue) for ReportName in (
         [Column1], [Column2], [Column3], [Column4], [Column5], [etc])
     ) pvt;

If the above is fast then the joins are at fault, pivot otherwise.

6 days ago Oleg

@Oleg, thanks that is good to know for future pivoting as I didn't know that. Anyway I ran the above with just the pivot data at it's still ~1 minute execution so it looks like the pivot is expensive to run?

6 days ago David 2 1

@David 2 1 Yes, but this is actually a good thing because it means that the joins to the views are light. The select from survey does not have any predicates so the engine has no choice but to scan the whole table or clustered index, same thing at the end of the day. The good news is that the pivot is reducing the number of rows in the final result by design. It is impossible to tell without looking at the structure of the table, but it looks like the design is flexible, almost like EAV pattern (you have the column label / column actual name combinations it it). If and only if the table has many more columns in addition to the 5 columns in select from survey statement part then the query might hugely benefit from the covering index or at least from CustID, SubID with other 3 included. Such index will be bound to quickly become highly fragmented which will necessitate the index regular maintenance. So, as usually, it depends on what you need to accomplish. If 1 minute execution time is sufficient then let it be. Otherwise, you may consider adding that index (if you can afford it, it all depends on how many more columns are out there).

6 days ago Oleg

@Oleg, thanks for the insight. I will test a covering index over FormID, FormName, SubID, CustID, ReportName, and ColumnValue but at present the speed improvement is significant enough as I'll be looking to schedule and cache the SSRS reports overnight. :)

5 days ago David 2 1
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x115
x61
x35
x19

asked: Dec 01 at 02:23 PM

Seen: 37 times

Last Updated: 5 days ago

Copyright 2017 Redgate Software. Privacy Policy