Phil Factor SQL Speed Phreak Competition: No 2
This competition is now over, but the winner, Dave, got an Amazon Voucher for $60, and the privilege of being able to display the 'Phil Factor SQL Speed Phreak' award on their own siteIt was quite a struggle with some close competition from many of those who participated in this competition. However, Dave came up with a clever solution that produced an FIFO calculation from a million rows in a just a few seconds. The third Phil Factor Speed Phreak competition will soon be held on here. Watch out! (here is the original preamble.) This competition is to calculate current items in stock and current stock value in FIFO order. I have seen many different algorithms to do this and most of them involve a cursor.
Can it be done more quickly without, or even with, a cursor? In other words, what is the fastest way in SQL Server (any version) to provide this stock inventory report? Beware that the sample file includes the two extra columns; CurrentItems (INT) and CurrentValue (MONEY) ! The table is in this form (simplified from the way we'd do it in a real system of course).
You are welcome to change the two nonclustered indexes to suit your solution. You can download the complete sample data here. I have an idea of my own of the way to do this but I don’t know if it is the fastest. Explanation of FIFO rules (example, abbreviated)
a) First IN add 738 items (each $245.94) to the stock, for a total of $181,503.72 As mentioned before, the CurrentItems and CurrentValue columns in the sample data are only included for you to validate your routines. Here are some guidelines for your entries:1) Include a header in your suggestion. Make sure your name and date is present. I will allow you to use an existing tally number table (make sure it starts with 0). You can use any kind of object for this competition, except SQLCLR. If you are using a fixed tally number table, it has to be named dbo.TallyNumbers and the column named Number. The winner will be amongst the tied fastest entrants (generally there is a group of these) and it will be the one with the highest number of votes. We'll announce the winner in three week's time on 16th November. For a starter, here is a common cursor based solution that you will find in production in many places.
The above is already done. Here is the code to produce the final resultset.
Good luck to you all! Peter Larsson
(comments are locked)
|
|
Timings are quite an eye-opener. They do vary quite a bit, and some solutions vary more than others. I took the average of six timings. It is a tie between GianLuca and Dave Ballantyne. Because I can only award the competition to one person, I'm going to use as a tie-breaker the fact that the fastest timing in the whole run was by Dave Ballantyne at an amazing 2220 ms. He therefore is awarded the Phil factor Speed Phreak award, and sets the next competition. However, I've asked Red-Gate to award a second $60 Amazon Voucher to Gianluca for his wonderful entry 1a which was so close to Dave's average timings that one couldn't put a cigarette paper between them. Matt Whitfield's late modification to his CLR comes in at 4300 ms, and gets an honourable mention, as does Scot Hauder (4620 ms) Steve(1a) with 4416 ms The original run gave the following figures (I've used Matt's second CLR solution)....
**Secs Ms Entry**
3 2560 Dave Ballantyne - Phil Factor Challenge Entry 3.d
2 2750 Gianluca 1a
3 3050 Gianluca 2a
3 3190 Dave Ballantyne - Phil Factor Challenge Entry 3.b
4 3313 Matt (waiting for fixed version!)
4 4360 Matt Whitfield CLR solution v2
4 4560 Steve 1a
5 5033 Scot Hauder
6 5830 Herman 1d
6 5906 RBarryYoung 2d Pure Set-Based, summary calcs
10 10173 Andriy Z. 1
16 16300 Dave Ballantyne Ver 2b
41 41000 Dave Ballantyne Ver 1a
62 61673 RBarryYoung_1b: Naive, Pure Set-Based Solution
102 101430 Peso CURSOR (not an entry)
111 111160 Herman, 30 Oct 2009
Just to show you the sort of variations i was seeing, here is another sample run. In future, I'm going to judge the winner over at least 20 runs, and record the variance as well as the average time. 4 4236 Dave Ballantyne - Phil Factor Challenge Entry 3.b 4 4546 Gianluca 1a 5 4703 Dave Ballantyne - Phil Factor Challenge Entry 3.d 6 5566 Steve 1a 6 6190 Scot Hauder 7 7216 RBarryYoung_2d: Pure Set-Based, summary calcs 7 7450 Herman 1d 9 8610 Gianluca 2a 10 10296 Andriy Z. 1 18 17813 Dave Ballantyne Ver 2b 28 27500 Matt Whitfield CLR solution (V1) 42 42003 Dave Ballantyne Ver 1a 63 62803 RBarryYoung_1b: Naive, Pure Set-Based Solution 114 113116 Herman, 30 Oct 2009 198 198136 Peso CURSOR (not an entry) Sadly, I left the harness to watch a DVD, and when I got back to do the next lot of runs, windows had decided to upgrade itself and I hadn't saved the latest version, so I'm afraid the umpire's decision is now final Final order over eight runs:
(I had to disqualify Matt as I haven't got an entry that gives the right answer) Phil - do you want any changes made to the harness, or have you not got round to unpacking that one yet? :)
Nov 11 '09 at 06:15 AM
Matt Whitfield ♦♦
Matt, do you have another version of your CLR for Phil to test?
Nov 12 '09 at 02:45 AM
Peso
It would be nice to get a CLR that is in the same region as the entrants that are taking just a handful of seconds! I don't think we'll get a cursor-based entry in the same league.
Nov 12 '09 at 08:21 AM
Phil Factor
Sounds closer than the initial tests have shown
Nov 12 '09 at 09:52 AM
dave ballantyne
For those suggestion with preliminary timings 10 seconds or less, there is slight mismatch betweeen my test machine and Phil's test machine for about +/- 1 second. But how come the CURSOR method went from 12 seconds to 102 seconds?
Nov 12 '09 at 10:26 AM
Peso
(comments are locked)
|
|
I had some help with this one from a friend (thanks Mark!). it's just at a second on my machine. Let me know if I did anything wrong Impressive! I get wrong results from this for some rows, but I don't think it's going to be difficult to fix. Impressive.
Nov 05 '09 at 02:32 PM
Gianluca Sartori
See for example ArticleID 24986.
Nov 05 '09 at 03:43 PM
Peso
And change last SELECT statement to select articleid, finalitems, finaldollars from #fifofinal order by articleid Also change all cleanup to last statement so that I can run this 10 consecutive times. No nees to drop and recreate index.
Nov 05 '09 at 03:44 PM
Peso
I have investigated the query, and the culprit is "end * max(s.price)". The price is not the max price, it is the latest price for an IN transaction. The max price may be the earliest price if price value is falling.
Nov 05 '09 at 04:01 PM
Peso
(comments are locked)
|
|
I have a question regarding the rule for Returns. As follows: You explain that because we can't track which original sales were being returned, therefore we always price returns at the price of the IN immediately preceding the return. However, this does not make sense in the case when there were no OUTS after the IN which immediately precedes the RET. In such a case, because we know that the RET could not have been of the items that just came IN. The rule should be "RET is priced at the price of the IN immediately preceding the OUT immediately preceding the RET." This way RETs can be consistently ignored in the calculations, because only real INs count. And it is consistent with FIFO. please advise It does make sense for the economists. They want their stock to be as accurate values as possible. If we encounter a customer return ("RET"), we want the stock to be updated with the latest known value the the article used in the return. And since we cannot track exact order from which the article was taken out at, we use the latest known price available. For example, if price is increasing, we want the stock to be valued at the higher (latest) price, not the price originally assigned, and if the price is decreasing, we want the same thing, the latest known price for that article.
Nov 06 '09 at 04:31 AM
Peso
Adn regarding your OUT statement. The OUTtake can span several IN transactions. Which price do you suggest we use, if an OUT drains 5,6 or more IN's?
Nov 06 '09 at 04:32 AM
Peso
(comments are locked)
|
|
Even though under FIFO, current value is tracked at the more recent purchase prices, the value of goods already in inventory is not adjusted up or down just because another purchase is made at a different price. The premise of FIFO is not to track the value of current inventory at the latest, or most accurate basis. This is not market based cost tracking, or LCM. It's one of two accounting constructs designed for an environment where a) inventory value is tracked at cost, and b) subtractions from inventory cannot be tracked back to the exact point and price of entry. In such an environment, the arbitrary, yet consistent, assumption that First in First Out provides a method to consistently track value in an inherently inconsistent and unknown environment. The problem with always using the latest IN price for returns is that in cases where no OUTS occurred after the latest IN, or where the number of items returned exceeds the number of OUT items that sold after the latest IN, the practice violates condition a) above - that value is tracked at cost. Since RETs can only happen if there are OUTS, it makes sense to treat RETs as cancellations of the OUTS. And just like OUTS can span several INs (that's a premise of the rule), RETs can span, or cancel, several OUTs. I'm not hung up over it, I can see how treating a RET as another purchase at the most recent price is consistent as well, but that complicates the query:)
(comments are locked)
|
|
Here's my entry. I'm calculating running count in reverse order and grouping INs and RETs based on their rank in the stock. Runs in 1.4 sec on my crappy laptop and requires "just" 53933 scans. Seems promising. I will test it when I have access to a clean test machine.
Nov 06 '09 at 07:48 AM
Peso
I get between 126k to 169k reads... We'll see what Phil get on his test harness
Nov 06 '09 at 02:07 PM
Peso
Uhmm... strange. On my laptop it runs faster than Dave's query. Well, I'm glad to see it's a good entry anyway.
Nov 06 '09 at 02:25 PM
Gianluca Sartori
For last competition, the Subscription List SQL Problem, we saw a huge difference for suggestions run on laptops vs servers, even if they both had same version number for both instances.
Nov 06 '09 at 03:02 PM
Peso
Gianluca, for Phil to accurate test this suggestion, please comment which indexes you use besides IX_Stock_GS1. Are you using the non-clustered indexes in the competition? Or variations of those?
Nov 06 '09 at 04:08 PM
Peso
(comments are locked)
|
« previous 1 2 3 4 5 6 next page »



Was there any particular reason you disallowed CLR functionality?
On an initial look, this would be the ideal scenario in which to use a CLR type...
Two things had us made this decision: 1) We have tried CLR before and there was no performance gain. 2) To test and verify, poster need to disclode the source code for the suggestion. But if you have a CLR routine that runs on 15 seconds or less, I think Phil will be interested in testing it.
Questions: 1) Is the Stock Transaction List referred to the one that will be used for Final Evaluation? If not, can you describe the differences? 2) What is the configuration that entries will be (finally) evaluated on?
Yes, the downloadable file is the data against all suggestions will be measured. Phil will drop or rename the two last columns so that no suggestion can use them. I guess Phil will be using same test harness as last time with 'Subscription List' competition.
Those filtered indexes will only work on SQL Server 2008, right?