Phil Factor SQL Speed Phreak Competition: No 1
This competition is now over, but the winner, Peso, got an Amazon Voucher for $60, and the privilege of being able to display the 'Phil Factor SQL Speed Phreak' award on their own siteIt was quite a struggle with some close competition from many of those who participated in this competition. However, Peso came up with a blindingly fast winner that produced an aggregation from a million rows in a third of a second. Now, we're all scratching our heads trying to come up with the fastest way of solving.... The ‘FIFO Stock Inventory’ SQL Problem...on this site(here is the original preamble.) I really genuinely don't know the answer to this question: What is the fastest way in SQL Server (any version) to provide the following subscription report. It is a reasonable request. We have a subscription list with 10,000 subscribers and we need to do a management report that gives the monthly breakdown of the Date, the number of current subscribers at the end of the month, the number of resignations in the month ('Unsubscribes'), and the number of new subscribers. The list should be in date order, and the date should be just the date of the first day of the month. The table is in this form (simplified from the way we'd do it in a real system of course) CREATE TABLE [dbo].[Registrations]
(
[Registration_ID] [int] IDENTITY(1, 1)
NOT NULL,
[FirstName] [varchar](80) NOT NULL,
[LastName] [varchar](80) NOT NULL,
[DateJoined] [datetime] NOT NULL,
[DateLeft] [datetime] NULL,
CONSTRAINT [PK_Registrations] PRIMARY KEY CLUSTERED
([DateJoined], [LastName], [FirstName])
)
CREATE INDEX idxDateJoined
ON Registrations (DateJoined, DateLeft, Registration_ID)You are welcome to change the two indexes to suit your solution. I'll give you a reasonable amount of data to try stuff out on. 10,000 faked data entries of subscribers is the most that I can reasonably ask you to download, (The list is here) but I shall be taking each solution, putting it in in a test harness consisting of a million subscribers. (I may even make it 1,169,187 to celebrate SQLServerCentral's subscription list) , and find the fastest way of doing it. I have ideas of my own of the way to do this but I suspect they're wrong. Note that in the sample data the subscriptions that extend to sept 2010 are those people who’ve paid for a year’s subscription only rather than those which have ongoing renewals (e.g. direct Debit). ‘Now’ is the end of September 2009. I will allow you to use a number table, but you can assume that every month has new subscribers in it. You can use views, or temporary table, but the time taken for their creation will be included in the timings. You can use a cursor if you don't mind a sharp intake of breath from Jeff. You can use any version of SQL Server that you like. The winner will be amongst the tied fastest entrants (generally there is a group of these) and it will be the one with the highest number of votes. We'll announce the winner in a week's time on 19th October. Who knows, if you are the only person who ever discovers this site and this competition, then the winner will be you! OK End of day 1 and we've had some extremely good results (in milliseconds) from the original 10,000 data table Matt 16 Kev Riley 93 Graham 60 Peso 33 Gianluca 16 AndyM 170 Joe Harris 15533 William brewer 406 I've had to temporarily disqualify Kev, AndyM and Joe Harris as I couldn't get their results to match the correct values. All the other results agreed. The results for Matt, Peso and Gianluca are all very close indeed, too close to get a result from 10,000 rows, so I'll have to up the test table to a million rows. From experience, I know that a lot could change between now and next Monday. I feel sure that Graham's SQL can be tweaked. and I expect to see the slow ones to be back in the running with some fine-tuning. Anyone else? OK End of Day 2, and Peso has streaked ahead with his third version. it is now too fast to measure on 10000 rows so I shall have to move to a larger database for the performance tests. Now I've put result-checking into the test harness Gianluca has had to be temporarily disqualified until he can cure his result, which is slightly out. Matt 16 ms Kev Riley 76 ms * Graham 60 ms Peso 30 ms Gianluca 33 ms * AndyM 170 ms * William brewer 406 ms Peso 2 16 ms Peso 3 0 ms
graham 1 6986ms Peso 1 890ms Peso 2 1173ms Peso 3 1170ms Matt 1 596ms Matt 2 873ms Gianluca 1 4546ms Peso 4B 940ms Andriy Z 1200ms * graham 2 1656ms Day 5 Currently re-creating the test harness! timings on the 1 Million row tables are as follows. remember that these are preliminary timings and Peso and I are trying to scratch our heads to see why we are getting such different results. In the test harness, the results are inserted into a table variable and subsequently checked for validity. Entry Elapsed time in milliseconds graham 1 6983 ms Peso 1 890 ms Peso 2 1186 ms Peso 3 1173 ms Matt 1 576 ms Matt 2 860 ms Gianluca 1 4550 ms Peso 4B 936 ms Peso 4d 830 ms Peso 4e 313 ms Andriy Z 1203 ms Graham 2 1640 ms Brewer 2 406 ms Peso 1d 1076 ms Gustavo 1 580 ms Gianluca 4 2390 ms ..at the moment I write this, William Brewer's entry (incorporating ideas from Graham, Peso and Matt) seems to be in the lead! (Saturday, it is now Peso 4e) However, I'll be holding the competition open until Monday evening GMT to allow for people who didn't hear that the competition was on last Monday. Just if you thought that some potential winners were emerging, look at the results with the original 10,000 row table. Yes, quite different. (We'd have to use a C# or VB test-harness to sort out the speed differences!) One would never be able to sort out the fastest from such close competition! Entry Elapsed time in milliseconds graham 1 80 ms Peso 1 13 Peso 2 16 Peso 3 16 Matt 1 13 Matt 2 33 Gianluca 1 30 Peso 4B 16 Andriy Z 13 Graham 2 16 Brewer 2 30 Peso 1d 33 Gustavo 1 16 Gianluca 4 13 Peso 4d 16 The complete million long test table is now on http://www.simple-talk.com/blogbits/philf/registrations.zip if you would like to fine-tune your winning entry Monday. Here are the final rankings These are the simple 'elapsed time' measurements. I'll follow this up as soon as possible with the details. Matt has very kindly written a special C# test harness for accurate timings that I'm dusting out at the moment. I'm planning to do a full write-up on my SSC blog as there are some important lessons for anyone doing this sort of reporting task. After all, there is a world of difference between the time and CPU loading of the best entrants and the Ho-hums. Even the Ho-hums are a lot better than some of the production code I've seen. These timings are very tight so in all fairness, I have to also award Gustavo, as runner-up, the right to proudly display the 'Phil Factor SQL Speed Phreak' award. After all, what is 30 milliseconds in processing a million rows. Peso has to get the title, not only for the best entry (Peso 4E), but for his energy (he was on paternity leave!) and for the way he helped and advised the other entrants. A true champion. Special Thanks to Matt for all the advice, and the test harness. Peso4E was such a good entry that it even struck terror into Celko-Prizewinner Barry, and decided him not to enter the competition. However William and Matt deserve special commendation for their entries which remain brilliantly fast over a million rows. Peso 4E 313 Gustavo 3 343 Gustavo 4 346 Brewer 2 423 Peso 1e 470 Peso 1 500 Gustavo 1 563 Matt 1 580 Matt 1B 593 Peso 1d 856 GianLuca Final 856 Peso 4d 856 Peso 1D 860 Matt 2 873 Matt3 923 Peso 4B 940 Graham 2 1106 Peso 3 1156 Peso 2 1170 Andriy Z 1233 * Gustavo/peso 2800 Gianluca 1 4500 graham 1 4656 Remember that the winning routines were calculating aggregate reports on a million-row tables in between a third and half a second. These included calculations that you will sometimes read has to be done using a cursor! (* in results means result needs minor tweaking to make it conform.
(comments are locked)
|
Edit the CTE and add this WHERE clause WHERE PeopleJoined > 0 To only get those months where people have joined.
Oct 14 '09 at 10:46 AM
Peso
Peso, it is great to have the explanation in the comments. I loved the nice example of a use for the OUTPUT Clause
Oct 14 '09 at 10:49 AM
Phil Factor
It is giving months in the future with me
Oct 14 '09 at 10:50 AM
Phil Factor
Actually, with the peopleJoined>0 in place it would work because I've already said that there is no month without anyone joining.
Oct 14 '09 at 10:53 AM
Phil Factor
Un-freakin-believable Peso! This proc is incredibly fast.
Oct 14 '09 at 11:24 AM
RBarryYoung
(comments are locked)
|
|
Here's my answer.... Graham, thanks a lot for that. Very neat this one, Very neat....
Oct 12 '09 at 10:54 AM
Phil Factor
Graham has done a second one which, on a million rows, is currently in the fastest group!
Oct 14 '09 at 11:02 AM
Phil Factor
William has now done a new version incorporating some ideas from Matt and Peso. William's version seem the quickest on my harness but we'll have to wait to see!
Oct 16 '09 at 03:26 PM
Phil Factor
(comments are locked)
|
|
Here is my second attempt without a "quirky" update
Here is my third attempt, with a "quirky" update
I don't think a quirky update is necessary here. Every month (every record) is 15 bytes (Date 3 bytes, Join 4 bytes, Leave 4 bytes and Subscribers 4 bytes; all non null). Since a page is the minimum amount of data read by SQL Server anyway, you can have about 44 years (and 9 months) in a single page. The "triangular join" update will be sufficient for those records (537 records for 44.78 years of data). SQLServerCentral started in March 2001, so there can be only 103 records (102 months since March 2001 to September 2009). A triangular join update for 103 records is really nothing. Peso, this goes so fast I can't measure it with 10,000 rows. It just says it took no time at all. I'm going to have to upgrade to a million row table.
Oct 13 '09 at 01:26 PM
Phil Factor
You so need to send me your specs of your test lab so I can replicate it here...
Oct 13 '09 at 03:18 PM
Peso
You have plans to reveal the ideas you wrote about initially?
Oct 13 '09 at 03:58 PM
Peso
I've got my own code in the harness too but it is slower than yours at the moment. It isn't a 'quirky': I didn't think the Quirky Update would work so fast as it has.
Oct 13 '09 at 05:55 PM
Phil Factor
I was able to solve it without the Quirky Update also, Peso (and linear time too), it's just not as fast. I had to spend too much effort handling those stupid months with no Unsubscribed count, grrrr...
Oct 13 '09 at 07:48 PM
RBarryYoung
(comments are locked)
|
|
Here is my attempt... Bearing in mind that I have only had 1 coffee so far today, this may yet get revised:
Here's version 2 (+2 cups of coffee)
Here is version 3 - which is faster still on my machine, although only a slight tweak. Interesting to see the results Phil puts out given that 2 is slower on his rig than 1 is.
Ok this is my final one: Please drink another one and try to shave a few milliseconds off it and you could have a winner, Matt. C'mon!
Oct 13 '09 at 03:16 PM
Phil Factor
I drank another two! New version below in new answer :)
Oct 13 '09 at 06:00 PM
Matt Whitfield ♦♦
Dan,g this thing is good, Matt. Similar to what I put together, only much better. (which is why I didn't post mine... ;-) ).
Oct 13 '09 at 07:46 PM
RBarryYoung
Barry - Thanks - that means a lot to me coming from you. I posted another version below which (at least on my machine) runs about 20% faster... Thanks again
Oct 13 '09 at 08:19 PM
Matt Whitfield ♦♦
Yes, this looks to be very fast. I think Barry will come in with a good solution soon
Oct 14 '09 at 11:06 AM
Phil Factor
(comments are locked)
|
|
I'm using the "Quirky XML Select" for this:
There's still some index needed, but I'm going with this for the moment. I'll be back with indexes maybe. Gianluca, It is an extraordinarily clever solution but unfortunately the result is not quite right. It is strange but it is out by only a small amount, but I'm afraid it is enough to disqualify you until you can do a fix.... Also, while you are about it, could you shave a few milliseconds off it as Peso is a nose ahead now!
Oct 13 '09 at 03:14 PM
Phil Factor
I would be glad to take acklowledgement for this technique, but it's Barry Young that came up with this. I tried to do the same in the past with the SELECT ... UNION ALL, but it doesn't perform the same! I'll take the time to fix my SQL, but I'm not sure that Peso can be beaten this time. He's an incredibly clever SQL developer and he can find unconventional solutions nobody else would have thought of!
Oct 14 '09 at 04:37 AM
Gianluca Sartori
Ok, I found what was wrong and I fixed it. I'm not very happy with this solution, so I'll try to find something faster.
Oct 14 '09 at 05:04 AM
Gianluca Sartori
I get correct result after changing "@now = '20091001'" to "@now = '20091101'"
Oct 21 '09 at 09:48 AM
Peso
(comments are locked)
|



Phil: Do we have to account for those expirations in the future or not? As you said that we could assume that every month has new subscribers, it seems to me that we do not (because those future months have no new subscribers yet)?
Hmm, something odd here, I'm getting results of ~30ms for Peso #3.
I can't really compare my results - because I didn't even run it with 10K rows - i went straight to a million... Are you doing the same Phil?
I especially liked this competition due to the fact there was no template for a solution! And I personally would like to keep these competitions this way. Real world problems that everyone might encounter one day.
The first competition (Celko's Prime number) already had existing algorithms so that competition was just about adapting them to T-SQL. Not so much fun even if it was educational. Congrats again Barry!
The second competition (Celko's Data Warehouse) was more fun because there was no given solution, and was a real world problem. Congrats again Gianluca!
What do you think?
Yes. I've just spoken to Richard of Red-Gate Marketing, who are our sponsors, and he's agreed to keep the prize money going. I'm tied up for the next fortnight, but I'm happy to advise on tests. (I'm taking a week off next week to stare at seal colonies through telescopes), then I'll be at PASS staring at Brent, and various MVPs, through binoculars. The competition will be in very safe hands with Peso!