I would very much like the experts advice on the subject of Table Variables and Temporary tables. What should be the recommendation on their use?
I wrote an article on Simple-Talk recently called [Temporary Tables in SQL Server] in which I said in the conclusion "I'm always wary of providing over-generalized advice, but I always prefer my databases to use Table Variables, and TVPs wherever possible, They require less resource, and you're less likely to hold onto them when you're finished with them. I like to use them to the max, with column and table checks and constraints. You may find times when they run out of steam, especially when table sizes get larger. In cases like this, or where it isn't practical to use table variables because of their restricted scope, then I'll use local temporary tables."
In answer to a comment, I said "*I usually do the development work on a local table (#) and get a set of timings using a good hunk of data. I then change to table variables (@), being careful to put in constraints on the rows I use to join.I then run the tests again to make sure performance is the same. It usually is. If I hit a performance problem, I'll try option (recompile), on the assumption that it can't generate a good query plan. If I still can't get equivalent performance then I'll go back to using a local temporary table (#)"*
Kevin Boles, who knows a great deal on the subject, has commented at the end of the article "Sorry Phil, but your advice to use table vars over real temp tables is completely contrary to proper guidance. There are optimization limitations (especially the lack of statistics) that very frequently lead to horrible query plans. A SINGLE ROW table variable can get you a suboptimal plan compared to the same query using a temp table, so the very common "use table vars if you have less than NNN rows" advice is flawed. I also don't understand your statement about resource differences between the two.
There are 2 reasons where table vars are appropriate and they are VERY infrequently found in the wild: 1) extremely high-call-frequency code where recompilations from temp table activity are a problem a 2) audit scenarios where you need to keep information after a transaction rollback has occurred.
I will also take exception to the advice to use constraints, checks, indexes, etc. In 15+ years of relational engine consulting I have eliminated those things (especially indexes) from temp tables to improve performance at least an order of magnitude more frequently than I have added them.'
I'm puzzled that I'm getting strong advice from more than one MVPs whose advice I respect, that one should not use such an important part of SQL Server as table variables and TVPs. I'm aware of their limitations (I covered them in the article) but my feeling is that the compromises made by Microsoft SQL Server Database Engine team in designing TVs and TVPs have been sensible for the general usage cases of table variables. Lack of Parallelism, for example, isn't necessarily a disaster for performance, for smallish tables surely? I'd always felt that if the database developer was aware of the risks and issues, then it was reasonable for them to judge when and where in their work it is better to use a Table Variable rather than a temporary table.
So. What do you think? Obviously, if I've got it wrong, I'll get the article amended, as I'd hate to think of anyone being misled by my advice!: http://www.simple-talk.com/sql/t-sql-programming/temporary-tables-in-sql-server/
asked Sep 24 '11 at 02:54 PM in Default
I too have found them to be extremely problematic. The issue comes down to the optimizer and how it treats them and the results of that treatment. The optimizer always assumes a single row. That's fairly well known and I think you mentioned it in the article. But, the question is, what happens when there are two or more rows? Generally, my advice has always been, if it's less than 100 rows (about, give or take, which direction is the window blowing), then you're probably OK to use TV. But, it's that word, "use" that digs the hole. Generally, I use them as temporary place holder and I avoid any actions on them other than insert & select if I can at all help it. As soon as you JOIN on them... things break down very quickly. I've recently seen execution plans that had 3-5 rows in a TV in a LEFT JOIN that produced horror show execution plans and substituting temp tables fixed the issue.It's nuts, but I actively attempt to avoid their use unless I'm forced into it or I can limit it so that their advantages (of which there are a couple) come to the fore without their disadvantages (of which there are many) overwhelm them.
answered Sep 25 '11 at 03:53 AM
Grant Fritchey ♦♦
Similar to Kevin and Grant, I also have seen very poor performance from table vars, and just simply changing to a temp table usually fixes the issue. In fact, in was when I was investigating the reasons behind this that gave me the question "What is so different between a temp table and a table var to cause this kind of performance issue?", which led me to creating my Comparing Table Vars and Temp Tables article and subsequently create a presentation off of that which I now present at a lot of SQLSaturday events.
While I haven't seen an issue at one row like Kevin has, I have seen a query that joined to a table var with only 48 rows (and appropriately indexed) return the results in over 3 minutes... just changing to a temp table returns results in under 10ms.It's been my experience that as soon as you either join a table var to any other table, or apply a where clause to a column in a table var, that performance takes a deep dive. Subsequently, if I'm doing a join or a where clause, I now only use temp tables. Since this is 99% of stuff I do, I essentially don't use table vars except in those places where SQL forces me to (table-valued functions or parameters). The only time I would choose to use them now is if I wanted to leverage their ability to be immune to explicit transaction rollbacks.
answered Sep 25 '11 at 12:22 PM
Here is a simple test to explore the differences between Table Variables and temporary tables in handling a simple join. This is a variation of the test that Kevin Boles used to show '6668 IO BAD PLAN'. I've compared the four possible combinations of using a small table joining to a much larger. When I ran it, (and you might find different results) there was a difference in the parallelism but no significant difference in the four query plans. All ran in under 3 Ms. SQL Server 2008 r2. The data I used for CommonWords (useful test data) can be [downloaded from here]
1) guidance from Microsoft is varied and problematic. All one need do is examine the database structures and code of Sharepoint to understand that there are lots of substandard things going on SQL Server wise within Microsoft. Oh, take a gander at the myriad of issues that come along with TSQL UDF usage too. :-) And how about the hugely outdated 1MB growth fragment size default for database files (and more importantly lack if exceptionally stressed guidance to NOT keep that for any database), 5 for cost threshold for parallelism, 1 file per cpu core for tempdb guidance, etc, etc.
2) Voiding parallelism for DML activity involving table vars has essentially nothing to do with the TVP itself. It is with that massive 6 table 50M row hitting SELECT statement that is populating it. THAT is where the perf problem comes into play.
3) But the simplest thing to keep in mind is that you are GUARANTEED to get SEVERAL bad things happening some/most of the time when you use table vars over temp tables as I have said and demonstrated. So the guidance should always be if you are not seeing the two reasons I said they should be used for (recompilation perf issues or the very esoteric auditing need for rolled-back transactions) then don't use them. You just have the guidance backwards is all.4) BTW, I am surprised and disappointed that a tech reviewer for the article didn't call you out on some of the issues I brought up.
answered Sep 24 '11 at 07:19 PM
I have come across a few instances where TVs have been employed in code and have cringed when I see the est row count going into them. Usually re-writing the code to use a #table has brought significant improvements. I cant say every time as i havent recorded the occurrences.
I now recommend in the office to write options using both and compare but to expect anything with more than a handful of records in a TV to be the slower/worse option.Christian Bolton was speaking at our user group a few weeks ago and in his parallelism and CXPacket wait session he suggested that seeing performance gains with TVs comes when they are used for few records - OK for months of the year and so on but when the record count goes up the performance drops of and the #table is favoured.