Phil Factor SQL Speed Phreak Competition: No 2
This competition is now over, but the winner, Dave, got an Amazon Voucher for $60, and the privilege of being able to display the 'Phil Factor SQL Speed Phreak' award on their own site
It was quite a struggle with some close competition from many of those who participated in this competition. However, Dave came up with a clever solution that produced an FIFO calculation from a million rows in a just a few seconds. The third Phil Factor Speed Phreak competition will soon be held on here. Watch out!
(here is the original preamble.)
This competition is to calculate current items in stock and current stock value in FIFO order.
I'll tell you the business rules for the algorithm and provide a sample cursor-based routine that generates the correct result, but slowly (about 40 minutes).
I have seen many different algorithms to do this and most of them involve a cursor. Can it be done more quickly without, or even with, a cursor? In other words, what is the fastest way in SQL Server (any version) to provide this stock inventory report?
It is a reasonable request. We have a stock transaction list (1,000,001 records, 17.6MB zipped) with 15,002 unique articles and we need to do a stock inventory report that gives the current breakdown of the ArticleID, the number of items in stock and the current stock value according to FIFO rules. The list should be in ArticleID order. The sample stock transaction list will include two running totals so you can check the results of your routine. They are NOT part of the competition, and you are not allowed to use them in your calculation.
Beware that the sample file includes the two extra columns; CurrentItems (INT) and CurrentValue (MONEY) !
If you want a smaller subset to test on, import all records and remove all articles but two or three. These are the samples for which all suggestions will be measured with.
The table is in this form (simplified from the way we'd do it in a real system of course).
In this made-up but realistic problem, you have an existing table and database design which you can’t do anything about. (we can all relate to that!) You have an existing cursor-based solution that is taking several minutes to run. (Yes, we all relate to that). The database is performing badly, and you need to take effective steps to remedy this. The only thing you can do is to come up with a better-performing routine. Redesigning the database isn’t an option very often, in the real world, because this requires team sign-in. This is a competition based on the real, sometimes imperfect, world: not an exposition of database design. The point is that we are faced with designs like this and we have to heal them. The competition is all about making stuff go faster.
CREATE TABLE dbo.Stock
(
StockID INT IDENTITY(1, 1) NOT NULL,
ArticleID SMALLINT NOT NULL,
TranDate DATETIME NOT NULL,
TranCode VARCHAR(3) NOT NULL,
Items INT NOT NULL,
Price MONEY NULL,
CONSTRAINT [PK_Stock] PRIMARY KEY CLUSTERED
(
StockID ASC
)
)
CREATE NONCLUSTERED INDEX IX_Input ON dbo.Stock (TranCode, ArticleID)
--INCLUDE (TranDate, Items, Price) -- Remove comment for SQL Server 2005 and later
--WHERE TranCode IN ('IN', 'RET') -- Remove comment for SQL Server 2008
CREATE NONCLUSTERED INDEX IX_Output ON dbo.Stock (TranCode, ArticleID)
-- INCLUDE (TranDate, Items) -- Remove comment for SQL Server 2005 and later
--WHERE TranCode = 'OUT' -- Remove comment for SQL Server 2008
You are welcome to change the two nonclustered indexes to suit your solution. You can download the complete sample data here. I have an idea of my own of the way to do this but I don’t know if it is the fastest.
Explanation of FIFO rules (example, abbreviated)
StockID ArticleID TranDate TranCode Items Price CurrentItems CurrentValue
4567 10000 10:45:07 IN 738 245.94 738 181,503.72
21628 10000 12:05:25 OUT 600 138 33,939.72
22571 10000 14:39:27 IN 62 199.95 200 46,336.62
30263 10000 16:14:13 OUT 165 35 6,998.25
42090 10000 18:18:58 RET 5 40 7,998.00
58143 10000 20:18:54 IN 500 135.91 540 75,953.00
a) First IN add 738 items (each $245.94) to the stock, for a total of $181,503.72
b) Then we take out 600 items (each 245.94) from the stock, leaving a total of $33,939.72
c) Then we insert 62 items (each 199.95) to the stock, for a total of $46,336.62
d) Then we take out 165 items (138 each 245.94 and 27 each 199.95), leaving a total of $6,998.25
e) Then we return 5 items. We can’t track at which price we took them out; so all returns are priced at the price of the latest ones inserted before the return. Even if there should be items left for the price of 245.94, the returned items are valued for 199.95. After the return, the current stock value is $7,998.00
f) The final insert adds $67,995.00 to the stock value, for a total of $75,953.00
As mentioned before, the CurrentItems and CurrentValue columns in the sample data are only included for you to validate your routines.
Here are some guidelines for your entries:
1) Include a header in your suggestion. Make sure your name and date is present.
2) Include an edition number. First edition is 1. If you later improve your current suggestion post it again as version 2. Example: “Peso 1” and if improved, “Peso 1b”, “Peso 1c” etc.
3) If you are trying a new algorithm, change the edition to “Peso 2”. If you improve this algorithm, change the version to “Peso 2b”, “Peso 2c” etc. This will save Phil hours of work in the test harness!
4) If you create a temp table, make sure you delete it in the script.
5) Keep the order of columns in output as ArticleID, CurrentItems, CurrentValue
I will allow you to use an existing tally number table (make sure it starts with 0). You can use any kind of object for this competition, except SQLCLR. If you are using a fixed tally number table, it has to be named dbo.TallyNumbers and the column named Number.
The time taken for their creation will not be included in the timings. The time measured is the “Main” script/procedure. If you want to call sub-procedures, go ahead.
The winner will be amongst the tied fastest entrants (generally there is a group of these) and it will be the one with the highest number of votes. We'll announce the winner in three week's time on 16th November.
For a starter, here is a common cursor based solution that you will find in production in many places.
CREATE TABLE #Work
(
RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Price MONEY
)
DECLARE @ArticleID INT = 1,
@PrevID INT = 0,
@TranCode VARCHAR(3),
@Items INT,
@Price MONEY,
@Loop INT = 0,
@StockID INT,
@LatestPrice MONEY,
@Total INT = (SELECT COUNT(*) FROM dbo.Stock)
DECLARE curYak CURSOR FORWARD_ONLY FOR
SELECT ArticleID,
TranCode,
Items,
Price,
StockID
FROM dbo.Stock
ORDER BY ArticleID,
TranDate
OPEN curYak
FETCH NEXT
FROM curYak
INTO @ArticleID,
@TranCode,
@Items,
@Price,
@StockID
WHILE @@FETCH_STATUS = 0
BEGIN
IF @ArticleID > @PrevID
BEGIN
TRUNCATE TABLE #Work
SET @LatestPrice = NULL
END
IF @TranCode = 'IN'
BEGIN
INSERT #Work
(
Price
)
SELECT @Price
FROM dbo.TallyNumbers
WHERE Number < @Items
SET @LatestPrice = @Price
END
IF @TranCode = 'RET'
INSERT #Work
(
Price
)
SELECT @LatestPrice
FROM dbo.TallyNumbers
WHERE Number < @Items
IF @TranCode = 'OUT'
DELETE w
FROM (
SELECT TOP(@Items)
RowID
FROM #Work
ORDER BY RowID
) AS w
UPDATE s
SET s.CurrentItems = w.CurrentItems,
s.CurrentValue = COALESCE(w.CurrentValue, 0)
FROM dbo.Stock AS s
INNER JOIN (
SELECT COUNT(*) AS CurrentItems,
SUM(Price) AS CurrentValue
FROM #Work
) AS w ON s.StockID = @StockID
SELECT @PrevID = @ArticleID,
@Loop += 1
IF @Loop % 1000 = 0
RAISERROR('Now updating record %d of %d.', 10, 1, @Loop, @Total) WITH NOWAIT
FETCH NEXT
FROM curYak
INTO @ArticleID,
@TranCode,
@Items,
@Price,
@StockID
END
DROP TABLE #Work
CLOSE curYak
DEALLOCATE curYak
The above is already done. Here is the code to produce the final resultset.
SELECT ArticleID,
CurrentItems,
CurrentValue
FROM (
SELECT ArticleID,
CurrentItems,
CurrentValue,
ROW_NUMBER() OVER (PARTITION BY ArticleID ORDER BY TranDate DESC) AS recID
FROM dbo.Stock
) AS d
WHERE recID = 1
ORDER BY ArticleID
Good luck to you all!
Peter Larsson