Phil Factor SQL Speed Phreak Competition: No 2
This competition is now over, but the winner, Dave, got an Amazon Voucher for $60, and the privilege of being able to display the 'Phil Factor SQL Speed Phreak' award on their own site
It was quite a struggle with some close competition from many of those who participated in this competition. However, Dave came up with a clever solution that produced an FIFO calculation from a million rows in a just a few seconds. The third Phil Factor Speed Phreak competition will soon be held on here. Watch out!
(here is the original preamble.)
This competition is to calculate current items in stock and current stock value in FIFO order.
I have seen many different algorithms to do this and most of them involve a cursor.
Can it be done more quickly without, or even with, a cursor? In other words, what is the fastest way in SQL Server (any version) to provide this stock inventory report?
Beware that the sample file includes the two extra columns; CurrentItems (INT) and CurrentValue (MONEY) !
The table is in this form (simplified from the way we'd do it in a real system of course).
You are welcome to change the two nonclustered indexes to suit your solution. You can download the complete sample data here. I have an idea of my own of the way to do this but I don’t know if it is the fastest.
Explanation of FIFO rules (example, abbreviated)
a) First IN add 738 items (each $245.94) to the stock, for a total of $181,503.72
As mentioned before, the CurrentItems and CurrentValue columns in the sample data are only included for you to validate your routines.
Here are some guidelines for your entries:
1) Include a header in your suggestion. Make sure your name and date is present.
I will allow you to use an existing tally number table (make sure it starts with 0). You can use any kind of object for this competition, except SQLCLR. If you are using a fixed tally number table, it has to be named dbo.TallyNumbers and the column named Number.
The winner will be amongst the tied fastest entrants (generally there is a group of these) and it will be the one with the highest number of votes. We'll announce the winner in three week's time on 16th November.
For a starter, here is a common cursor based solution that you will find in production in many places.
The above is already done. Here is the code to produce the final resultset.
Good luck to you all!
Dave Ballantyne - Phil Factor Challenge Entry 3.d
Peso had adding some index hints and shuffled a calculation to my initial 3.a. So this is 3.a(+) with comments
with 3.d doing the TotalStock sum in one operation shaves some more time off
“A problem well stated is a problem half solved” - Charles F. Kettering
Let us think about what has been asked within the challenge.
We have a warehouse which starts empty , the data contains the stock movements in (including returns) and out. The stock movements ALWAYS happen in a first in first out basis. With this we know that if the present stock level (sum(in's) - sum(out's) is 50 that will be the last 50 that have entered the warehouse. The key to this query is to efficiently find the cost of those 50 items.
RBarryYoung_1b: Naive, Pure Set-Based Solution
OK, just to get things rolling, here is a straight-forward Set-based solution, with no T-SQL tricks (though there is a very important algorithmic trick in cteOverlapValue).
Oops! I got those numbers completely wrong,...
In reality, performance is OK, but not great. it takes 9 seconds on 100,000 rows (about 1500 Articles) and 90 seconds on the full set. Not bad for virtually no optimizations.
1b: OK, I have made the corrections Requested by Peso (ORDER BY and Fixed NULLS on Article 25001). My current timings for it are 9sec on 100k rows and 95sec on 1M rows.
Not finished yet , i think there is some more to come. Could you tell me what the performance is like against the 'test' system ? Just for reference i stopped Barry's running after 5 Minutes , this completes in a little over a minute.
RBarryYoung_2d: Pure Set-Based, summary calcs
OK, similar to my first entry, but using a smarter way to calculate the final CurrentValues:
New version, 2c (don't ask about 2b :-)), primarily leveraging a new index....
OK, now we're really cooking with gas! Andriy's version spurred me to rethink certain parts of the query, resulting in a smaller and even fast version that finally uses the indexes the way that I always wanted it to.
First the new index to create before running:
Andthe new version:
This is twice as fast as v2c (7 sec) on my machine.