x

Extensive group by

HI,

I was wondering if someone could give me some insight into this particular performance related issue. The specific case I have is fairly complex, but rather than replicate all of it, I will try to describe it in general terms. I have two versions of a report - one is the 'detailed' and the other a 'summary'. The call for the detailed is implemented as a table function. It's quite 'wide - with around 100 columns. Some of these are numeric (about 10 columns). For the 'summary' I built a function which calls the detailed summing the numerics and calling 'group by' on the non-numerics. There are about 50 of the latter. The detailed version works and performs well. However, the summary is disproportionately slow. In both cases, I can pass through filtering parameters to reduce the scope of the query. I would have though that the summary would simply receive the lines delivered by the detailed and act on those. However, it seems to want to scan way more extensively. Is there something manifestly wrong with this pattern? Should I send the detailed results to a temporary table and then process that for the summary?

Any advice would be most appreciated.

Thanks,

Stan

more ▼

asked Feb 07 at 12:21 PM in Default

avatar image

brutecat
1

@brutecat Please share some more details.Otherwise, it is quite difficult to understand the problem.

Feb 08 at 11:48 AM Usman Butt
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

Without seeing the code it is difficult to be sure, but from what you are describing it sounds like the optimizer is deciding to aggregate the data and then filter it (in the summary function)? So if this is the case, why is the optimizer taking this approach?

  • the TVF's are complex and SQL 'gives up' trying to expand all the code - so it treats the complex objects as black-boxes, and you don't get to utilise the statistics on the underlying base tables

  • optimizer runs out of 'time' - the optimizer only has a certain amount of time/resources in which to produce an execution plan - if this is exceeded then it simply goes with the best it has so far - you will see this in the execution plan as an attribute "Reason for early Termination" = "Time Out" - again this can be if the TVF's are complex

It would be really useful if you could share the execution plans from both the detailed and summary executions.

more ▼

answered Feb 12 at 07:31 AM

avatar image

Kev Riley ♦♦
66.8k 48 65 81

Kev,

Thanks. I think the first answer is the one. It seems that the large number of groupings causes the optimiser to go west. I have attached the plans and stats. The detailed query produces 4 lines and takes about a minute. The summary does nothing more than consolidate this and produce one line in about 20 min.

It does strike me as odd that the optimiser is confused by this pattern as it does not seem that exotic. Perhaps it's the large number of groupings(?) Would you have any suggestions for getting around this? BTW, for various reasons, this needs to be written as a function and not a stored proc. My functions are all in-line at the moment. Would a multi-statement TVF work any better?

Thanks,

Stan

5 days ago brutecat
(comments are locked)
10|1200 characters needed characters left

First of all, I am not able to find any attachment for plan etc. So please do the needful. Moreover, if you could post the DDL of the function and DDLs of the tables involved, we may help you more precisely.

As far turning this iTVF into an MTVF, although, I have done it a few times myself, but I would not recommend it unless you know your access patterns and workload exactly. It could help you in one case, but you could suffer in other cases. So this change would need some good testing. I will be more inclined to revisit the iTVF code that may need some changes, as for just 4 rows output, one minute is too much for my liking. May be you could be missing some key indexes? But all these things are guesswork.

But if you cannot change the code, then you could also use the Divide and Conquer approach. You could put the iTVF result in a Temporary table and then perform the aggregations on that temporary table. But this approach may not be viable if the result is varying drastically in magnitude or is quite huge. Although, I am quite sure, it would take less time than the current solution.

Another approach is to use the trick I learned from MVP Adam Mechanic i.e. to use a row goal in order to control the flow. It had worked various times as a magic bullet for me where I was not supposed to change the code or any major change which required testing. Following is the skeleton of this approach

 select sum(a), sum(b)...
 from
 (
 select top (2147483647) * --custom row goal
 from yourITVF
 )a
 group by col1, col2.....
more ▼

answered Feb 13 at 08:08 AM

avatar image

Usman Butt
14.3k 6 13 21

Kev,

Thanks. It seems the file is too big. It's here:

https://app.box.com/s/ymlfroh9favnpm3onok9vxckx6uv59yt

You are right 4 rows in one min is slow. However, a 'real' run which returns 100K lines takes 5-6 min. This is a a report built over a commercial application, so I do have constraints I can't shift. I also can't publish the code for non-disclosure reasons.

It's been good to have the discussion and I will take your advice on board. I just wanted to know if what I was trying to do was manifestly insane.

I will try this tomorrow and revert back.

Thanks again,

Stan

5 days ago brutecat
(comments are locked)
10|1200 characters needed characters left

Thanks,

I have a table valued function called 'Detailed' which returns a moderately large number of lines (possibly 100K, or thereabouts). It is quite wide with about 100 columns. About 10 of the columns have numeric values that differ on each line. Around 30 other columns are text based and vary per line. There is a unique identifier (a GUID) which is the same for variable groups of the lines - who ch could be anywhere from 1 to 50 lines. Along with this unique identifier, there are the remaining 59 columns which are also the same for each line and are repeated. So the output looks like:

GUID,GroupTextCol1,GroupTextCol2...........LineTextCol1,LineTetCol2.....LineNUmberCol1,LineeNumberCol2...

I want to create a summary function which looks something like this:

SELECT GUID, GroupTextCol1, GroupTextCol2, . . . SUM(LineNumberCol1) as SumLine1, SUM(LineNumberCol2) as SumLine2, . . from Detailed(@Param1,@Param2....) Group By GUID, GroupTextCol1, GroupTextCol2,

This works functionally, but the performance is very poor. In the 'Detailed' function, I can pass parameters that filter the data set. So, in a test scenario, I produce 4 lines for the detailed. I would have expected the query optimiser to deliver these 4 lines to the summary function, but it does not seem to work this way. The summary version scans over a much wider data set.

Does this make it clearer?

Thanks,

Stan

more ▼

answered Feb 11 at 10:03 PM

avatar image

brutecat
1

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x314
x75

asked: Feb 07 at 12:21 PM

Seen: 65 times

Last Updated: 5 days ago

Copyright 2018 Redgate Software. Privacy Policy