Phil Factor SQL Speed Phreak Competition: No 2
This competition is now over, but the winner, Dave, got an Amazon Voucher for $60, and the privilege of being able to display the 'Phil Factor SQL Speed Phreak' award on their own siteIt was quite a struggle with some close competition from many of those who participated in this competition. However, Dave came up with a clever solution that produced an FIFO calculation from a million rows in a just a few seconds. The third Phil Factor Speed Phreak competition will soon be held on here. Watch out! (here is the original preamble.) This competition is to calculate current items in stock and current stock value in FIFO order. I have seen many different algorithms to do this and most of them involve a cursor.
Can it be done more quickly without, or even with, a cursor? In other words, what is the fastest way in SQL Server (any version) to provide this stock inventory report? Beware that the sample file includes the two extra columns; CurrentItems (INT) and CurrentValue (MONEY) ! The table is in this form (simplified from the way we'd do it in a real system of course).
You are welcome to change the two nonclustered indexes to suit your solution. You can download the complete sample data here. I have an idea of my own of the way to do this but I don’t know if it is the fastest. Explanation of FIFO rules (example, abbreviated)
a) First IN add 738 items (each $245.94) to the stock, for a total of $181,503.72 As mentioned before, the CurrentItems and CurrentValue columns in the sample data are only included for you to validate your routines. Here are some guidelines for your entries:1) Include a header in your suggestion. Make sure your name and date is present. I will allow you to use an existing tally number table (make sure it starts with 0). You can use any kind of object for this competition, except SQLCLR. If you are using a fixed tally number table, it has to be named dbo.TallyNumbers and the column named Number. The winner will be amongst the tied fastest entrants (generally there is a group of these) and it will be the one with the highest number of votes. We'll announce the winner in three week's time on 16th November. For a starter, here is a common cursor based solution that you will find in production in many places.
The above is already done. Here is the code to produce the final resultset.
Good luck to you all! Peter Larsson
(comments are locked)
|
|
Dave Ballantyne - Phil Factor Challenge Entry 3.d Peso had adding some index hints and shuffled a calculation to my initial 3.a. So this is 3.a(+) with comments with 3.d doing the TotalStock sum in one operation shaves some more time off “A problem well stated is a problem half solved” - Charles F. Kettering Let us think about what has been asked within the challenge. We have a warehouse which starts empty , the data contains the stock movements in (including returns) and out. The stock movements ALWAYS happen in a first in first out basis. With this we know that if the present stock level (sum(in's) - sum(out's) is 50 that will be the last 50 that have entered the warehouse. The key to this query is to efficiently find the cost of those 50 items.
(comments are locked)
|
|
What do you think? Please comment here what you think. Wow, have I been there, Peso. It's probably Phil's call, but my feeling is NO. I say this knowing that I may be in your position too in the near future (hopefully :-)). The problem is that no matter how fair you are, sooner or later someone is going to feel cheated because you were both a competitor and the Question Asker (or worse, Tester/Judge). What I would recommend is just throwing you solution in at the end, in the final measurements, as an example non-competing entry. I.e., non-eligible to win.
Oct 23 '09 at 05:41 PM
RBarryYoung
I respect that. If this is the consensus, I will post my set-based solution very last before deadline so that no-one can steal my thunder :-)
Oct 23 '09 at 05:51 PM
Peso
Oh boy! Someone already voted this as an unhelpful post grin
Oct 23 '09 at 05:59 PM
Peso
I think the series would be more fun if we disallow the current winner from entering, since it gives other people a chance! Actually, I think we should be allowed to steal Peso's thunder too at some point!
Oct 24 '09 at 05:23 AM
Phil Factor
Well, I'm upvoting it, so they should cancel out, ;-)
Oct 24 '09 at 03:24 PM
RBarryYoung
(comments are locked)
|
|
Here will the preliminary timings be presented. I write preliminary since Phil is (will be) away for a few days, so I will do the preliminary timings on a similar machine as Phil has. Current standingsBarry's timings on his solution varies due to a lot of tempdb space. But it is still a great suggestion.
Oct 26 '09 at 08:51 AM
Peso
What configuration are you testing on, Peso? I would think that my solution #2 would do a lot better than that on a multi-core environment. (Or is MAXDOP set down?)
Oct 26 '09 at 08:58 PM
RBarryYoung
It's probably my tempDB going through the roof. I'll try the suggestions on a better tempdb configuration. Your queries generates a lot of activity on the tempdbs.
Oct 27 '09 at 09:44 AM
Peso
(comments are locked)
|
|
Well, I know this answer will be disqualified, but I thought it might be interesting to see how a CLR Type would help with this. Here is my code, which runs in about 45 seconds on my box. I also think the idea of a quirky update using a CLR type is funny!
The source code for the assembly type is (no comments as yet - I might come back and fix this): It's a great entry! However, have you disabled parallelism manually?
Oct 24 '09 at 05:48 PM
Peso
I did the MAXDOP 1 because it was a typical quirky update safety addition - I did remove it at one point but didn't find it make any difference to the performance overall...
Oct 24 '09 at 05:56 PM
Matt Whitfield ♦♦
It's a great learning experience for me! Thank you very much. Until now, I have only dealt with aggregate SQLCLR. Take a look here http://www.developerworkshop.net/software.html
Oct 24 '09 at 06:17 PM
Peso
(comments are locked)
|
RBarryYoung_1b: Naive, Pure Set-Based SolutionOK, just to get things rolling, here is a straight-forward Set-based solution, with no T-SQL tricks (though there is a very important algorithmic trick in cteOverlapValue).
Oops! I got those numbers completely wrong,... In reality, performance is OK, but not great. it takes 9 seconds on 100,000 rows (about 1500 Articles) and 90 seconds on the full set. Not bad for virtually no optimizations. 1b: OK, I have made the corrections Requested by Peso (ORDER BY and Fixed NULLS on Article 25001). My current timings for it are 9sec on 100k rows and 95sec on 1M rows. Excellent suggestion Barry! Great to see a pure set-based solution. However, there are two things I miss. 1) Articles need to be sorted by ArticleID. 2) Last ArticleID 25001 reports correct Items but wrong Value.
Oct 25 '09 at 04:43 AM
Peso
OK, thanks Peso. Obviously I can fix (1) right now ... (:-)), (2) is going to take some investigation though.
Oct 25 '09 at 09:58 AM
RBarryYoung
The procedure header says 1a, but the code (I think) is 1b?
Oct 26 '09 at 08:42 AM
Peso
Right. I'm not very good with directions (that's why I need computers :-)). Fixed now.
Oct 26 '09 at 11:26 AM
RBarryYoung
Ok, sorry for the really slow comment, but this was the one showing incorrect values for me - first article, value shows as 0.0, should be 15446.39. Sorry for the delay!
Oct 30 '09 at 06:23 PM
Matt Whitfield ♦♦
(comments are locked)
|



Was there any particular reason you disallowed CLR functionality?
On an initial look, this would be the ideal scenario in which to use a CLR type...
Two things had us made this decision: 1) We have tried CLR before and there was no performance gain. 2) To test and verify, poster need to disclode the source code for the suggestion. But if you have a CLR routine that runs on 15 seconds or less, I think Phil will be interested in testing it.
Questions: 1) Is the Stock Transaction List referred to the one that will be used for Final Evaluation? If not, can you describe the differences? 2) What is the configuration that entries will be (finally) evaluated on?
Yes, the downloadable file is the data against all suggestions will be measured. Phil will drop or rename the two last columns so that no suggestion can use them. I guess Phil will be using same test harness as last time with 'Subscription List' competition.
Those filtered indexes will only work on SQL Server 2008, right?