question

vishug1971 avatar image
vishug1971 asked

SQL queries

how to retrieve all the top rows from a table, against an aggregate value , currently I am using cursor to achieve the same., but I want to use it in different format... slno ProductCode ProductDesc AvailableQty PurgrnNo PurPrice ----------- ----------- -------------------- ------------ -------- --------- 33564 RAM002 DDR4 4GB LAPTOP RAM 2 000060 1925.37 33566 RAM002 DDR4 4GB LAPTOP RAM 2 000061 1925.37 67784 RAM002 DDR4 4GB LAPTOP RAM 4 000130 2268.25 79226 RAM002 DDR4 4GB LAPTOP RAM 2 000187 2321.00 80032 RAM002 DDR4 4GB LAPTOP RAM 10 000197 2241.90 90101 RAM002 DDR4 4GB LAPTOP RAM 4 000013 2162.75 with the above data the [availableQty] if provided in aggregate value as 10 then the result should show as slno ProductCode ProductDesc AvailableQty PurgrnNo PurPrice ----------- ----------- -------------------- ------------ -------- --------- 33564 RAM002 DDR4 4GB LAPTOP RAM 2 000060 1925.37 33566 RAM002 DDR4 4GB LAPTOP RAM 2 000061 1925.37 67784 RAM002 DDR4 4GB LAPTOP RAM 4 000130 2268.25 79226 RAM002 DDR4 4GB LAPTOP RAM 2 000187 2321.00 though the sum of the column may exceed 10 but the fetching of the row should shop at this., suppose if the aggregate value is 22 then slno ProductCode ProductDesc AvailableQty PurgrnNo PurPrice ----------- ----------- -------------------- ------------ -------- --------- 33564 RAM002 DDR4 4GB LAPTOP RAM 2 000060 1925.37 33566 RAM002 DDR4 4GB LAPTOP RAM 2 000061 1925.37 67784 RAM002 DDR4 4GB LAPTOP RAM 4 000130 2268.25 79226 RAM002 DDR4 4GB LAPTOP RAM 2 000187 2321.00 80032 RAM002 DDR4 4GB LAPTOP RAM 10 000197 2241.90 90101 RAM002 DDR4 4GB LAPTOP RAM 4 000013 2162.75 rows should display, if there is fewer rows against aggregate given AvailableQty then it should display all the rows..
sql server 2008 r2
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@vishug1971 If the available quantity parameter is 22 then why do you need the last row to be included? The cumulative sum of all previous rows before the last one is 20 (still less than 22), but if the last row is included then it is already 24 (more than 22). Also, your sample data includes only one product code, but I guess you actually might need the rows for multiple products as well, partitioning the results so that the rows for each product code include only those until the cumulative sum hits the desired number (per product code). Please clarify. Thank you.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@vishug1971 I updated my answer to include the solutions which will work in SQL Server 2008 R2, but the answer went into moderation, so it is not visible at this time until one of the site moderators approves it. Please let me know if any of the solutions work for you.
0 Likes 0 ·
vishug1971 avatar image vishug1971 commented ·
Well thanks once again for the initiative taken from your side, if we sum of the first 4 rows the cumulative result is 20 in numbers so to meet the the requirement that is 22 the last row is needed thanks
0 Likes 0 ·

1 Answer

·
Oleg avatar image
Oleg answered
The solution in this answer is based on the assumption that there could be multiple records for the same product and that there are multiple products in the table in question. The task is to filter the rows for each product so that the cumulative sum of the quantities is still less than or equal to the desired threshold OR such sum is already flipped through the threshold but the sum of the previous row did not. Unfortunately, I do not have access to the SQL Server 2008 R2, which is no longer supported, but is still on life support for another year or so (extended support end date is July 9 2019). I know that the lead and lag analytical functions are not available in this version, and it is also possible that the frame specifications are not available either but I am not sure about the frame. In SQL Server 2012, if I need to calculate the cumulative sum and also the cumulative sum excluding the current row, I can do it at once using the frame specs. Alternatively, I could use the lag analytical function but the frame specs are just as easy. Here is the script which will work in SQL Server 2012 or higher for sure and might or might not work in SQL Server 2008 R2 (if it does not then the second script should work in the older version): -- SQL Server 2012 or higher (might not work in 2008 R2) ;with records as ( select *, sum(AvailableQty) over ( partition by ProductCode order by slno rows between unbounded preceding and current row ) Cumulative, sum(AvailableQty) over ( partition by ProductCode order by slno rows between unbounded preceding and 1 preceding ) PrevCumulative from YourTable ) select slno, ProductCode, ProductDesc, AvailableQty, PurgrnNo, PurPrice -- , Cumulative, PrevCumulative from records where Cumulative <= 10 or PrevCumulative < 10 If the script does not work in SQL Server 2008 R2 then there is a way to make it work without the frame specifications. The cumulative sum should work, and while there is no way to snoop into the value from the previous row directly, there is a way to do it by joining the table to itself on the off-by-one basis. Here is the script: -- SQL Server 2008 R2, it does not actually work per @Usman Butt, but still works in 2012+ ;with records as ( select *, sum(AvailableQty) over (partition by ProductCode order by slno) Cumulative, row_number() over (partition by ProductCode order by slno) N from YourTable ) select a.slno, a.ProductCode, a.ProductDesc, a.AvailableQty, a.PurgrnNo, a.PurPrice -- , a.Cumulative, b.Cumulative PrevCumulative from records a left join records b on a.ProductCode = b.ProductCode and a.N = b.N + 1 /* off-by-one join */ where a.Cumulative <= 10 or b.Cumulative < 10; Please change the number to what it needs to be (from number 10 given as an example) or better still make it a parameter. Uncomment the commented lines to reveal the values of the cumulative sum and prev. cumulative sum to see how the script works. **Edit to add possible solution(s) which actually work in SQL Server 2008 R2** Since @Usman Butt opened up a can of awesome T-SQL worms in his comment, the problem now has to be elaborated. The running total problem in earlier versions of SQL Server is very well explained and solved in the [Solving the Running Total and Ordinal Rank Problems][1] article by @Jeff Moden. This article is a must read. It provides multiple solutions, including the cursor loop and quirky update. The latter term was originally coined by Robyn Page in one of her workbench articles titled [Robyn Page’s SQL Server Cursor Workbench][2]. The article is referenced in @Jeff Moden article. One of the solutions, included in the article is the triangular join solution. It is provided specifically to demonstrate how inferior is the technique to any other method. The problem with triangular joins is the ample number of records that the engine has to consider when calculating running totals. For example, say, your table has 1 million rows but all of them are for the same product code. In this case, the number of records that the engine needs to access is n * (n + 1) / 2 where n is the record count, which evaluates to ~ 500 Billion, yes, billion with B. If the table has same one million rows but, say, 50,000 product codes with about 20 records per product code then the number of records to access would be ten and half million, which is much better than the number of records if there was only one product code for all records, but still, this is a much higher number than the number of records in the table. The ten and half million is the result of this formula: p * (n * ( n + 1) / 2) where p is the count of product codes, and n is the average number of rows per single product code. Due to the reasons described above, I do not submit to a forceful scripting of the triangular join based solutions. However, just to show how one such solution might look like, here is the script (please note that the join of the results of the select used as the source for CTE with itself based on the off-by-one basis is still needed due to the requirements about the inclusion criteria): -- This solution will work in SQL Server 2008 and even 2005 -- but is ill advised due to the use of the triangular join. ;with records as ( select t.*, row_number() over (partition by ProductCode order by slno) N, ( select sum(AvailableQty) from YoutTableNameGoesHere -- the triangular join condition, do not use the solution, it is shown -- just to demonstrate the dangers of using it (performance issues) where ProductCode = t.ProductCode and slno <= t.slno ) Cumulative from YoutTableNameGoesHere t ) select a.slno, a.ProductCode, a.ProductDesc, a.AvailableQty, a.PurgrnNo, a.PurPrice -- , a.Cumulative, b.Cumulative PrevCumulative from records a left join records b on a.ProductCode = b.ProductCode and a.N = b.N + 1 /* off-by-one join */ where a.Cumulative <= 10 or b.Cumulative < 10; I have been dinged in the past for including the semicolon on the same line, just before my CTE definition, but I have been, and will continue to do it. The reason is that the CTE is required to be either the first statement in the batch or follow the previous statement which has been terminated with semi-colon. Because I have no way of controlling where the CTE will be used, I place the semicolon in front of it as a precaution. Without such semi-colon, it is enough to type something like **print 'Hello'** a couple of lines above the beginning of the script with CTE in order to break the batch with some obscure error about **Incorrect syntax near the keyword 'with'**. The other technique which is well worth considering is the quirky update solution. This will be by far, the fastest possible solution. The only problem is that the data must be suitable for it. The table in question must be clustered by **ProductCode, slno** (in this order), if the quantities in existing records change over time or the new records are inserted into the table then the whole table must be updated to recalculate the cumulative totals. Otherwise, it is not possible to use the quirky update, in which case, the workaround could be to create a temp table with 4 columns: slno, ProductCode, AvailableQty, CumulativeQty (nullable), cluster it by ProductCode, slno, insert data into it from all rows (3 columns) of the table in question. This will result in CumulativeQty = null for all records in temp table. After that the temp table may be quickly updated so that all its records now have CumulativeQty calculated correctly. At this time, join of this temp table with the original table, still using the same condition in the where clause, will work. Here is the script showing how to update all rows in the table in question (assuming that it is already clustered correctly, the new column has been already added to it with NULL for all records initially, and it is OK to actually use such update because the table is static): -- qurky update declare @prevProductCode char(6); declare @cumulativeQty int; update YourTableNameGoesHere with (tablockx) set @cumulativeQty = CumulativeQty = case when ProductCode = @prevProductCode then @cumulativeQty + AvailableQty else AvailableQty end, @prevProductCode = ProductCode option (maxdop 1); Once the table is updated, we still have to use the join of it with itself on the off-by-one, but this is easy now, and will certainly work in SQL Server 2008 R2 and even 2008 and 2005. ;with records as ( select *, row_number() over (partition by ProductCode order by slno) N from #t t ) select a.slno, a.ProductCode, a.ProductDesc, a.AvailableQty, a.PurgrnNo, a.PurPrice --, a.CumulativeQty, b.CumulativeQty PrevCumulative from records a left join records b on a.ProductCode = b.ProductCode and a.N = b.N + 1 /* off-by-one join */ where a.CumulativeQty <= 10 or b.CumulativeQty < 10; Hope this helps. Oleg [1]: http://www.sqlservercentral.com/articles/T-SQL/68467/ [2]: https://www.red-gate.com/simple-talk/sql/learn-sql-server/robyn-pages-sql-server-cursor-workbench/
8 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

vishug1971 avatar image vishug1971 commented ·
Greeting Oleg, the first example which you have provided using SQL server 2012 on wards, which is working fine., thumbs Up, since I am using SQL server 2008 R2 under production environment I need to work with second script, a small question can I use derived table in place of Yourtable place, since the columns which are in the results are associated with respective master tables, since I am new to CTE expression I need some help in understanding the concepts thanks...
0 Likes 0 ·
vishug1971 avatar image vishug1971 commented ·
Hi Oleg, while executing the script , there is an exception which throws for this query select * , sum(AvailableQty) over ( partition by ProductCode order by slno) Cumulative, row_number() over (partition by ProductCode order by slno) N from _Stk As Incorrect syntax near 'order'. is it possible to use sum(AvailableQty) over ( partition by ProductCode order by slno) in this case....
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
@vishug1971 Since @Oleg has already mentioned that he does not have the environment to test it, he left the order by in OVER() clause for SUM. But I believe it would not run on SQL Server 2K8R2. Anyhow, you should have got the idea how to sort out such little discrepancy yourself. Besides, I wonder why you want a different solution to your original solution. AFAIK, till SQL Server 2k8R2, for such cumulative/running totals, you get better performance with either a CURSOR solution or the QUIRKY update solution unless you are working with a small set of data.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@vishug1971 Thank you for the thumbs up (it looks like I have zero upvotes now, which probably means that someone downvoted my answer, which is a bummer, but I guess I can live with this). Regarding the CTE (common table expression) concept: yes, you can use derived table, I just don't know your actual table names and used YourTable as an example, that is all. The CTE is akin to sub-select in some ways, but it is much more than that. In the second script, I use the data from CTE joined with itself on the of-by-one basis. If I used the sub-selects instead then I would have to have 2 sub-selects (copy-paste the same select twice). Any query which must use the results of the same select in more than one place might benefit from CTE, even though with some possible performance penalty. In other words, if performance of CTE is poor then it might make sense to populate the records in the temp table/table variable from that select which was the source of the CTE and then use that temp table/table variable in later queries. Sorry about the query not working in 2008 R2. Now that @Usman Butt pointed this out, I will update my answer shortly with the solution which should work in 2008 R2.
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
@Oleg I don't think the OP upvoted the answer. May be by "thumbs up" he meant "kudos for that". @vishug1971 can you confirm that?
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.