|
Once again it is time for a Phil Factor Speed Phreak Challenge and the prize is now a $100 Amazon voucher, and the privilege of displaying this rather nice trophy jpg on your blog / website / bedroom wall.
This time your task is simply to produce a summary report of the cost of international phone calls made within an organization on a week by week basis for both Users and Offices. You are allowed to use any method you see fit, you may also add any indexes, table-functions or views that you wish (though not an index view). Creation of these will not count to the overall execution time. If you are unsure if what you want to do might disqualify you, then please post a comment. The table CallLog contains the log of every phone call, including which user called which number, when the call started and ended, and the office the user was in at the time. You will notice that it is not well normalized, since it is actually a view taken from several tables. Users are never in a fixed office, and can move from office to office at any point. Calls with a CallEnd equal to CallStart were not answered and can be safely ignored. To calculate the cost of the call you need to use a lookup within the PhoneTariff table. The calls are prefixed by an area code corresponding to a country. You must note that many of the call areas have the same starting character sequence , so calls made to numbers starting '35191' must be priced using the tariff of '35191' not '351'. There is a rather elaborate charging system according to the length of the call. This means that you need to calculate the cost of the call as the minute by minute cost changes with the length of the call. These must be summed. Looking at the PhoneTariffCharges table
The first 8 minutes of the call will be 0.4792 per minute. 9 to 31 minutes will be 0.18 per minute 32 to 59 minutes will be 0.5702 per minute etc.... All ranges have a UpToXMinutes of 9999 so you dont need to worry about an upper limit. The calllength is rounded upto the nearest whole minute. Here is the DDL to create the tables
and heres the link to the data. Use
to load the data in. Heres the solution provided by our mediocre developer Robert Bar, please note a fix for an issue with the week number being taken from @CallEnd.
Here are some guidelines for your entries: 1) Include a header in your suggestion. Make sure your name and the current date is present. 2) Include an edition number. First edition is 1. If you later improve your current suggestion post it again as version 2. Example: “Peso 1” and if improved, “Peso 1b”, “Peso 1c” etc. 3) If you are trying a new algorithm, change the edition to “Peso 2”. If you improve this algorithm, change the version to “Peso 2b”, “Peso 2c” etc. This will save Phil hours of work in the test harness! 4) The solution must clear up all its mess (temporary tables, indexes, etc.) so it can be re-run without errors. As ever Phil Factor will be final judge. The closing date will be midnight Thursday 17th December London
This question is marked "community wiki".
showing 5 of 15
show all
|
|
Ok, here's my first go (now revised to align with question change - week from startdate): Setup:
Run:
Teardown:
This runs in between 10 and 15 seconds on my box, as compared to 35 minutes 25 secs for the Robert Bar (bless him) solution. If stored procs aren't allowed, let me know and I'll change it to bare definition, it doesn't make a massive difference to the timings anyway... Good Stuff. Results are spot on, just change the ALTER PROC to CREATE PROC. By my calculations , youre in front :)
(Dec 04 '09 at 14:54)
dave ballantyne
Whoops - sorry about that! One thing that confused me - why is the year number taken from the start date and the week number taken from the end date? Only produces differences on a few rows, but quite interesting! :) Also, let me know if you'd like a copy of the test harness I sent to Phil & Peso, should support everything needed for this challenge...
(Dec 04 '09 at 15:38)
Matt Whitfield ♦
That'll be a typo. Ill change it in the question now. Phil has already taken the liberty of forwarding on the harness, thanks.
(Dec 04 '09 at 15:53)
dave ballantyne
I get wrong result for Jose Yu. Me and Phil get two records (week 8 and 9). You get both timings on week 9 (1.654 + 1.7612).
(Dec 05 '09 at 22:39)
Peso
Yeah - if you have a look at the comments above, Dave has changed it so that the week number now comes from the start date and not the end date, like it was originally. I haven't had time to update it yet because I'm not at home...
(Dec 06 '09 at 12:22)
Matt Whitfield ♦
|
|
Initial Timings are pretty close ,
Just for comparison - timings from Matt W's machine:
Note that matt1 and lmu921 didn't run because I changed the collation in my test db and they both failed with collation conflicts. I didn't feel the need to alter my SQL entry! :)
This answer is marked "community wiki".
Nice formatting! However, I miss thousands separator and right aligned number... :-)
(Dec 07 '09 at 17:02)
Peso
My timings agree with this pretty closely. My routine is now in the mid five-seconds, but I need to double-check the results before I post the SQL! Yeah. cute formatting, Dave.
(Dec 07 '09 at 19:19)
Phil Factor
I'll add thousand separators and right alignment to the harness...
(Dec 07 '09 at 19:37)
Matt Whitfield ♦
Cant take any credit for the formatting :)
(Dec 07 '09 at 19:59)
dave ballantyne
There was an issue with my scripts which meant that lmu92's indexes were not created.
(Dec 09 '09 at 09:05)
dave ballantyne
Stonking results from Matt's Clr solution.
(Dec 09 '09 at 09:25)
dave ballantyne
Thanks :) I've been impressed by the results you can get from CLR entries over the last couple of challenges - I only spent 2 hours writing the CLR one, so I think the cost/benefit ratio is pretty good...
(Dec 09 '09 at 09:48)
Matt Whitfield ♦
Good work on keeping the timings fresh sir :)
(Dec 10 '09 at 09:14)
Matt Whitfield ♦
Heh this is getting to be very close!
(Dec 15 '09 at 15:22)
Matt Whitfield ♦
There is a phil2b which is a bit quicker.
(Dec 17 '09 at 23:01)
Phil Factor
Hmm. Trimming those indexes out made it faster on my machine. Not on anyone else's it seems!
(Dec 18 '09 at 12:35)
Phil Factor
showing 5 of 11
show all
|
|
Here's my second try. (lmu92 1b 20091216) Based on my previous version with the following changes: a) I don't create the result tables anymore, just printing the result sets with the SELECT statement (seems common practice throughout the solutions provided so far) b) replaced a UNION with a faster solution c) Added NOLOCK hints d) Building the intermediate table one step later, saving one update prepare base tables (create index)
code block
cleanup index
On my machine the code runs in the range of Pesos version 4a. It's still not as fast as Matts CLR though (at least on my machine)...
This answer is marked "community wiki".
Wow! 3.156 seconds on my machine. this is very impressive
(Dec 16 '09 at 21:55)
Phil Factor
I don't get it. The code takes 11 seconds on my machine...
(Dec 16 '09 at 22:51)
Peso
I suspect that quite a bit of the variation is the size of the buffer cache on the various machines.
(Dec 17 '09 at 08:53)
Phil Factor
|
|
I opted to go no hardwiring at all, because you never know when CallArea changes and suddenly has 6 digits, or more... First, the Setup part
And then the Teardown part
And finally the first version of the query
|
|
Phil Factor 1b (1a got mangled by the website because it has < in the code) This is my first entry, just to show a fairly conventional way of doing it. The call table is redone in a slightly more compute-friendly form. The same is done for the PhoneTariff table. Then the international call prefixes are identified by updating the table progressively, starting with the longest codes first and, at the same time, the PhoneTarrif and the initial connection charge is identified. Then, the calls are costed out, a charge-band at a time until the longest calls have been costed out. Once this has been done, then it is a simple matter of aggregating the reports. I've left my timing harness in place in case you want to tweak the solution, or if you want to see how I generally do it. On my server, I get times in the 7.5 sec range, for the whole operation. Last time I checked the result was the same as Robert Barr's (bless him)
Oops. the first time I pasted the code in, I use but because I had a < in the code, it got mangled. Is there an easy way to past code into this darned software?
(Dec 05 '09 at 22:21)
Phil Factor
Phil, I get Abel Buck included in your code. He is not to be reported since the call was not answered (CallStart equals CallEnd).
(Dec 05 '09 at 22:37)
Peso
|
|
Here's what I came up with (so far...) In terms of performance I'm in a range of less than 5sec (including output of the results which takes about 0.6 sec). Just to compare: I'm getting approx. 8sec for Phils solution on my PC (thank you for leaving the timing harness in there!).
|
|
Ok, here's matt version 2 - and you knew it was coming - CLR version Setup is:
Exec is:
Teardown is:
Hmmm... well i did include the source code, but I ran out of post space. I'll post that in a separate answer. This one runs at between 3 and 4 seconds on my box...
This answer is marked "community wiki".
|
|
Source code to my CLR version (for reference):
This answer is marked "community wiki".
|
|
/Phil Factor 1d Now running about 5.4 secs on my machine.
This answer is marked "community wiki".
Nice! Two things though... 1) Add "WHERE CallStart < CallEnd" to #TempCallLog to make the table smaller and 2) Add a sixth update clause to handle CHAR(6) definition.
(Dec 11 '09 at 19:15)
Peso
Doing 1) you can remove the "UPDATE ... > 0" clause
(Dec 11 '09 at 19:18)
Peso
Phil, how long time does the other suggestions take on your machine?
(Dec 11 '09 at 19:19)
Peso
For comparison, on my laptop this takes 11 seconds.
(Dec 11 '09 at 19:37)
Peso
How odd. I just checked it again. five and a half seconds. I had tried putting in more indexes (even followed SSMS's suggestions) and everything I did added another second to the results. Can you see what is slowing it down? Why do I need to add a sixth update clause to handle CHAR(6) definition when there aren't any six-character codes in the phoneTariff table?
(Dec 11 '09 at 21:58)
Phil Factor
Added the where clause as Peso suggested and took out the unnecessary update
(Dec 11 '09 at 22:06)
Phil Factor
With the 6 character issue, it's because the query is hard-wired to the data, rather than the possible range of data that the schema allows... You could do what I do in my CLR one and find the maximum prefix length, then loop from that down to 1, then that way it wouldn't need to be changed if the schema was updated in the future...
(Dec 11 '09 at 22:24)
Matt Whitfield ♦
Recheck you calculations Phil, they are not matching the control sets
(Dec 14 '09 at 09:56)
dave ballantyne
I think he is missing the connectioncharge amount
(Dec 14 '09 at 10:32)
Peso
I've updated the routine to clean it up. It is running at 5.2 seconds on my server but it still seems slower than Peso's. I keep thinking I've squeezed all the performance out of this algorithm. Sorry about data problems in previous version. i was having difficulty with the formatting and some stuff went missing. Now in place in new version and checked
(Dec 14 '09 at 19:45)
Phil Factor
Phil, on your machine, how long does Peso 4a take?
(Dec 15 '09 at 21:36)
Peso
showing 5 of 11
show all
|
|
Peso 4B - 20091216 Main Code
This answer is marked "community wiki".
Definitely faster, 3.941 seconds on my box - but only returns 53,703 rows in the second result set?
(Dec 16 '09 at 13:53)
Matt Whitfield ♦
...can I ask, too... how long did developing this one take? Just interested from a general cost/benefit perspective...
(Dec 16 '09 at 13:54)
Matt Whitfield ♦
Forgot a CROSS APPLY. This one took about 30 minutes to develop and another 15 minutes looking at the execution plan. However, I did miss a cross apply so debugging the code above took another hour. 2 hour total.
(Dec 16 '09 at 15:10)
Peso
Cool, correct results now, 5.317 seconds... It would have taken me a lot longer to develop the SQL the way you have I think. Probably more in the region of 5-6 hours...
(Dec 16 '09 at 15:31)
Matt Whitfield ♦
This runs in 3 secs on my machine. Mine runs in 5.4.
(Dec 16 '09 at 21:46)
Phil Factor
|

Can we be specific about how timings will be done? Average of x runs with a dropcleanbuffers before each? Average of x runs with a dropcleanbuffers before the first run?
What about Abel Buck? His call was 0.000 seconds and thus no charge for length of call, but is he still going to pay for the connection of 52 cents?
I am tryting to find a 100% correct answer but sometimes i doubt if the answer privided by Robert Bar is 100% Correct. Should we take his answer as the final results or should we try to prove that there could be something wrong with it and propose something new ?
With regard to Abel Buck the call was not answered and therefore no charge. "Calls with a CallEnd equal to CallStart were not answered and can be safely ignored." The Robert Bar solution should be correct. If there are any issues with a calculation please PM me and ill manually double check.
I took a random case, number dialled: 980913853342, according to Robert Bar he used area code 980 and number 91-385-3342 (formated 2,3,4) ["order by len(CallArea) desc"], but you could also get area code 98 and number dialled 091-385-3342 ( formated 3-3-4 ). My point is that phone numbers of 12 digits ( this case ) should have just 2 chars on area code, those with 13 digits, have 3 chars as area code and so on, so the remaning number will still be formated 3-3-4 and not [1..3]-3-4. Meaning that the proposed solution could not be the corret one...
Half the objective with the challenge is to overcome precisely this issue. The correct 'areacode' for that number is 980 as and not area code 98 for the simply reason that there are more matched digits. Our fictional phone company may have a different connection to 980 than 981 thru 989 , hence why that there is a different tariff for 98.
Nice twist, I was on the same page as Bart
Is two resultsets required? What are the column's order in the resultset(s)?
Two results set please , Users then offices. Order is unimportant
Apologies : Two results set please , Offices then Users. Row Order is unimportant
And column order?
I'm not using Matt's test harness yet, just doing timings in TSQL. For this, the only sensible approach to the timings is to insert the two results into a table. What do the other competitors thing? Is this the best way?
I think it's a bit of a fine balance - sometimes putting the rows in a table can take a lot longer than selecting them out, depending on a fair few factors. I tend to go with the client side testing, because any useful data is going to end up in a client at some point...
It seems 3 seconds is the lower limit for this challenge.
Any updates available from Phil's test harness?