question

David 2 1 avatar image
David 2 1 asked

Removing Costly SORT From PIVOT Query

Hi there, I have a PIVOT query that executes quite slowly. Looking at the execution plan I can see that the SORT operand is costing 96% of the execution. Is there a way to rewrite the query to remove this sort and/or reduce its cost to speed up the execution? SELECT pvt.* FROM (SELECT FormID, SubID, CustID, ReportName, ColumnValue FROM Survey ) AS y INNER JOIN VW_CustDetails cd ON cd.CustID = y.CustID INNER JOIN VW_SubDetails sd ON sd.SubID = y.SubID PIVOT (MAX(ColumnValue) FOR ReportName IN ([Column1] ) ) pvt; TIA @Oleg thanks. Sorry, ReportingName was a mistake, it should be ReportName. The data does return correctly however it is very slow. The query maybe badly written but the SORT is taking up the great majority of the cost of execution. The Survey table contains the following rows and the views are below: -- Survey Table contains ~7,500,000 rows. Columns are: ReportID, SubID, FormID, CustID, ClientID, FormName, ColumnName, ColumnLabel, ColumnDatatype, ColumnValue, Valid, ReportName, SupportingText CREATE VIEW VW_CustDetails -- returns additional customer details not present in survey table AS SELECT crm.ID ,CR_ID, ClientID, g.CompanyCode , g.CompanyName Company, UPPER(Asset) Asset, dbo.CapitaliseFirstLetter(cr.CompanyName) CompanyName, dbo.CapitaliseFirstLetter(Address1) Address1, dbo.CapitaliseFirstLetter(Address2) Address2, dbo.CapitaliseFirstLetter(Address3) Address3, dbo.CapitaliseFirstLetter(City) City, UPPER(cr.Postcode) Postcode, Phone, Fax , LOWER(WebAddress) WebAddress, LOWER(EmailAddress) EmailAddress FROM CRM.dbo.RC_CRM crm LEFT JOIN CRM.dbo.CustRecord cr ON crm.CR_ID = cr.ID LEFT JOIN Companies.dbo.Groups g ON crm.CompanyCode = g.CompanyCode; CREATE VIEW VW_SubDetails -- returns additional user details not present in survey table AS SELECT DISTINCT SubID ,cr.FormID,OriginalFormID,fs.UserID,Submitted, ISNULL(ud.Title,'') + ' ' + ISNULL(ud.FirstName,'') + ' ' + ISNULL(ud.LastName,'') Submitter FROM Reports cr LEFT JOIN Forms.dbo.FormSubmissions fs ON cr.SubID = fs.ID LEFT JOIN Core.dbo.UserDetails ud ON fs.UserID = ud.ID; ![alt text][1] [1]: /storage/temp/4407-4406-sort.jpg
pivotsqlserver2012execution-plansorting
4406-sort.jpg (26.3 KiB)
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@David 2 1 Does this query produce expected results? What is ReportingName? ReportName is included in the select list but I am not sure where ReportingName comes from. Also, why do you need joins to the (probably expensive) views when the data in Survey is already there? I have seen and used joins as the source which is then pivoted, or, alternatively, the source which is then pivoted which is then joined with something else, but I have never seen or used anything even remotely resembling the query like the one in your question because it looks very expensive even without knowing any details about the underlying data. Please post a small sample (couple of rows from survey and views), describe how the survey and views are "related", and expected results. Thank you.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@David 2 1 What columns are needed to be present in the final result? If there is a need to include only one column value to be pivoted (ReportName = Column1 to be exact) then why do you need a pivot when using the where clause will do the trick? Most importantly, please add the list of columns in the final result to your answer (edit existing answer rather than creating a new one). I mean there has to be some other columns required outside of the listed **FormID**, **SubID**, and **CustID**. With these 3 only, you can state select s.FormID, s.SubID, s.CustID, max(ColumnValue) as Column1 from Survey where ReportName = 'Column1' group by s.FormID, s.SubID, s.CustID; -- not sure about it is needed or not If you need to add some other columns to the list then add the joins to Survey, hopefully not views but the actual tables if possible. The views are somewhat questionable. For example, why RC_CRM is left joined with CustRecord? There should be no way to have CRM record without customer master record, so why LEFT join? Similarly, there should be no CustRecord with invalid Groups ref, so why LEFT join between these two?
0 Likes 0 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, Thanks again. The columns I need which will be reported on through SSRS are: FormName, SubID, CustID, ClientID, CompanyCode, Company, Asset, CompanyName, Address1, Address1, Address1, City, Postcode, Phone, Fax, WebAddress, EmailAddress, FormID, OriginalFormID, UserID, Submitted, Submitter, Column1 -- this is the column created from the pivot. --There are many more I have only included Column1 --here to try to make the example easier. I'm sure that I have not optimized the query correctly hence the LEFT joins as I've built it up ad-hoc. Do you think it can be rewritten better and this would remove the sort? Thanks. :)
0 Likes 0 ·

1 Answer

·
Oleg avatar image
Oleg answered
Perhaps I should clarify the necessity of pivoting data bound to be used as the source for SSRS report because all reporting tools do a pretty good job pivoting data on their end (placing the column on Columns area usually does the trick). Instead of trying to understand the reasons behind the necessity of pivoting data, I will instead try to restate the query in question in order to make it faster. From what I see, it looks like the reason for slowing it down is due to the fact that too much (too wide to be exact) info is used for pivoting whereas in reality only few columns need to participate, and the fluff may be added later. Here is the script (to the best of my ability to just type without any actual data available on my end): select pvt.FormName, pvt.SubID, pvt.CustID, cd.ClientID, cd.CompanyCode, cd.Company, cd.Asset, cd.CompanyName, cd.Address1, cd.Address2, cd.Address3, cd.City, cd.Postcode, cd.Phone, cd.Fax, cd.WebAddress, cd.EmailAddress, sd.FormID, sd.OriginalFormID, sd.UserID, sd.Submitted, sd.Submitter, pvt.Column1, pvt.Column2, pvt.Column3, pvt.Column4, pvt.Column5, pvt.etc from ( select FormID, FormName, SubID, CustID, ReportName, ColumnValue from Survey ) src pivot (max(ColumnValue) for ReportName in ( [Column1], [Column2], [Column3], [Column4], [Column5], [etc]) ) pvt inner join VW_CustDetails cd on cd.CustID = pvt.CustID inner join VW_SubDetails sd on sd.SubID = pvt.SubID; This way, the pivot involves nothing but the Survey table so the sort operation will be cheaper. Once the pivot is done, the results are then joined with others for details. I still retained the joins to the views instead of unwinding them to the actually needed tables. The optimizer is usually quite capable of expanding the guts of the view definitions into the plan. The only possible problem may be caused by the LEFT join present in the views when it may not have been needed, depending on data quality. Hope this helps. Oleg
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

David 2 1 avatar image David 2 1 commented ·
@Oleg, wow many thanks. You are literally a SQL guru. The query is now down from 20 mins to ~1 min. That is amazing. Just so I can understand in layman terms the problem I created was joining the views within the pivot?
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@David 2 1 Yes, the problem was that the data was first joined and then pivoted. This meant that 3 columns from Survey and ALL COLUMNS from both views were participating in the implicit group by. Before the data is grouped, it is sorted, and with that many columns the sort operation is expensive. All I did was I used just the needed Survey columns to pivot and then joined the results with the views as per your query in question. The execution time of ~ 1 minute is still bothersome though because with only 8 million rows in the Survey table, the engine should eat the pivot for breakfast. One way to check whether the pivot or the joins are at fault would be to run just the pivot part, i.e. select pvt.FormName, pvt.SubID, pvt.CustID, pvt.Column1, pvt.Column2, pvt.Column3, pvt.Column4, pvt.Column5, pvt.etc from ( select FormID, FormName, SubID, CustID, ReportName, ColumnValue from Survey ) src pivot (max(ColumnValue) for ReportName in ( [Column1], [Column2], [Column3], [Column4], [Column5], [etc]) ) pvt; If the above is fast then the joins are at fault, pivot otherwise.
0 Likes 0 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, thanks that is good to know for future pivoting as I didn't know that. Anyway I ran the above with just the pivot data at it's still ~1 minute execution so it looks like the pivot is expensive to run?
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@David 2 1 Yes, but this is actually a good thing because it means that the joins to the views are light. The select from survey does not have any predicates so the engine has no choice but to scan the whole table or clustered index, same thing at the end of the day. The good news is that the pivot is reducing the number of rows in the final result by design. It is impossible to tell without looking at the structure of the table, but it looks like the design is flexible, almost like EAV pattern (you have the **column label** / **column actual name** combinations it it). ***If and only if*** the table has many more ***columns*** in addition to the 5 columns in select from survey statement part then the query might hugely benefit from the covering index or at least from CustID, SubID with other 3 included. Such index will be bound to quickly become highly fragmented which will necessitate the index regular maintenance. So, as usually, it depends on what you need to accomplish. If 1 minute execution time is sufficient then let it be. Otherwise, you may consider adding that index (if you can afford it, it all depends on how many more columns are out there).
0 Likes 0 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg, thanks for the insight. I will test a covering index over FormID, FormName, SubID, CustID, ReportName, and ColumnValue but at present the speed improvement is significant enough as I'll be looking to schedule and cache the SSRS reports overnight. :)
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.