x

Are Table Variables a bad thing?

I would very much like the experts advice on the subject of Table Variables and Temporary tables. What should be the recommendation on their use?

I wrote an article on Simple-Talk recently called [Temporary Tables in SQL Server][1] in which I said in the conclusion "I'm always wary of providing over-generalized advice, but I always prefer my databases to use Table Variables, and TVPs wherever possible, They require less resource, and you're less likely to hold onto them when you're finished with them. I like to use them to the max, with column and table checks and constraints. You may find times when they run out of steam, especially when table sizes get larger. In cases like this, or where it isn't practical to use table variables because of their restricted scope, then I'll use local temporary tables."

In answer to a comment, I said "*I usually do the development work on a local table (#) and get a set of timings using a good hunk of data. I then change to table variables (@), being careful to put in constraints on the rows I use to join.I then run the tests again to make sure performance is the same. It usually is. If I hit a performance problem, I'll try option (recompile), on the assumption that it can't generate a good query plan. If I still can't get equivalent performance then I'll go back to using a local temporary table (#)"*

Kevin Boles, who knows a great deal on the subject, has commented at the end of the article "Sorry Phil, but your advice to use table vars over real temp tables is completely contrary to proper guidance. There are optimization limitations (especially the lack of statistics) that very frequently lead to horrible query plans. A SINGLE ROW table variable can get you a suboptimal plan compared to the same query using a temp table, so the very common "use table vars if you have less than NNN rows" advice is flawed. I also don't understand your statement about resource differences between the two.

There are 2 reasons where table vars are appropriate and they are VERY infrequently found in the wild: 1) extremely high-call-frequency code where recompilations from temp table activity are a problem a 2) audit scenarios where you need to keep information after a transaction rollback has occurred.

I will also take exception to the advice to use constraints, checks, indexes, etc. In 15+ years of relational engine consulting I have eliminated those things (especially indexes) from temp tables to improve performance at least an order of magnitude more frequently than I have added them.'

I'm puzzled that I'm getting strong advice from more than one MVPs whose advice I respect, that one should not use such an important part of SQL Server as table variables and TVPs. I'm aware of their limitations (I covered them in the article) but my feeling is that the compromises made by Microsoft SQL Server Database Engine team in designing TVs and TVPs have been sensible for the general usage cases of table variables. Lack of Parallelism, for example, isn't necessarily a disaster for performance, for smallish tables surely? I'd always felt that if the database developer was aware of the risks and issues, then it was reasonable for them to judge when and where in their work it is better to use a Table Variable rather than a temporary table.

So. What do you think? Obviously, if I've got it wrong, I'll get the article amended, as I'd hate to think of anyone being misled by my advice!

[1]: http://www.simple-talk.com/sql/t-sql-programming/temporary-tables-in-sql-server/
more ▼

asked Sep 24, 2011 at 02:54 PM in Default

Phil Factor gravatar image

Phil Factor
3.8k 8 9 16

I'm curious about the removing indexes from temp tables, for me, on larger temp tables indexes have saved the day but usually I write once and read many from these tables.
Sep 25, 2011 at 07:04 PM Scot Hauder
(comments are locked)
10|1200 characters needed characters left

11 answers: sort voted first

I too have found them to be extremely problematic. The issue comes down to the optimizer and how it treats them and the results of that treatment. The optimizer always assumes a single row. That's fairly well known and I think you mentioned it in the article. But, the question is, what happens when there are two or more rows? Generally, my advice has always been, if it's less than 100 rows (about, give or take, which direction is the window blowing), then you're probably OK to use TV. But, it's that word, "use" that digs the hole. Generally, I use them as temporary place holder and I avoid any actions on them other than insert & select if I can at all help it. As soon as you JOIN on them... things break down very quickly. I've recently seen execution plans that had 3-5 rows in a TV in a LEFT JOIN that produced horror show execution plans and substituting temp tables fixed the issue.

It's nuts, but I actively attempt to avoid their use unless I'm forced into it or I can limit it so that their advantages (of which there are a couple) come to the fore without their disadvantages (of which there are many) overwhelm them.
more ▼

answered Sep 25, 2011 at 03:53 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
98.1k 19 21 74

Grant, please refer to the original thread on Simple-Talk to see an example where a single row in a table variable gets you a horribly bad plan due to data distribution skew and the lack of statistics available to table variables. This goes beyond just the usual (and misguided) mantra of 'less than NNN rows use table variables'.
Sep 25, 2011 at 05:20 AM KGB
True, but even that mantra, when chanted appropriately, had a long list of caveats. I don't want to through the baby out with the bath water, despite the fact that the baby has a third eye & flipper hands. Table variables have their uses. But it has to be with careful application and a full understanding of what they do.
Sep 25, 2011 at 06:39 AM Grant Fritchey ♦♦
I suspect that the more we can understand under what conditions the query optimizer gridlocks on a bad strategy, the more explicit we can make advice to developers. I'm puzzled that I've used TVs a great deal over the years, with only rare problems where I've had to revert to a temp table. I've certainly come across some horror table variables but usually there are other design problems with the code. I wonder if some basic assumptions are being made about the good normalization of the data by the good people at Microsoft doing the Query Optimiser.
Sep 26, 2011 at 03:01 AM Phil Factor
(comments are locked)
10|1200 characters needed characters left

Similar to Kevin and Grant, I also have seen very poor performance from table vars, and just simply changing to a temp table usually fixes the issue. In fact, in was when I was investigating the reasons behind this that gave me the question "What is so different between a temp table and a table var to cause this kind of performance issue?", which led me to creating my Comparing Table Vars and Temp Tables article and subsequently create a presentation off of that which I now present at a lot of SQLSaturday events.

While I haven't seen an issue at one row like Kevin has, I have seen a query that joined to a table var with only 48 rows (and appropriately indexed) return the results in over 3 minutes... just changing to a temp table returns results in under 10ms.

It's been my experience that as soon as you either join a table var to any other table, or apply a where clause to a column in a table var, that performance takes a deep dive. Subsequently, if I'm doing a join or a where clause, I now only use temp tables. Since this is 99% of stuff I do, I essentially don't use table vars except in those places where SQL forces me to (table-valued functions or parameters). The only time I would choose to use them now is if I wanted to leverage their ability to be immune to explicit transaction rollbacks.
more ▼

answered Sep 25, 2011 at 12:22 PM

WayneS gravatar image

WayneS
31

(comments are locked)
10|1200 characters needed characters left

Here is a simple test to explore the differences between Table Variables and temporary tables in handling a simple join. This is a variation of the test that Kevin Boles used to show '6668 IO BAD PLAN'. I've compared the four possible combinations of using a small table joining to a much larger. When I ran it, (and you might find different results) there was a difference in the parallelism but no significant difference in the four query plans. All ran in under 3 Ms. SQL Server 2008 r2. The data I used for CommonWords (useful test data) can be [downloaded from here][1]

      dbcc freeproccache
      --SET STATISTICS IO on
      DECLARE @Words TABLE (Word varchar(40) PRIMARY KEY)
      CREATE table #Words (Word varchar(40) PRIMARY KEY)
      SET STATISTICS PROFILE ON
      INSERT INTO @Words (word)  --no parallelism inserted 60387 words
           SELECT CommonWords.string FROM CommonWords
      INSERT INTO #Words (word) --yes parallelism inserted 60387 words
           SELECT CommonWords.string FROM CommonWords
      SET STATISTICS PROFILE off
      /* so now we have a table variable of around 60,000 words and a
      temporary table with 60,000 words*/

      DECLARE @WordsToMatch TABLE (Word varchar(40))
      --create a sample of words to join with the list of 60,000 common words
      --and put them in a table variable
      inSERT INTO @WordsToMatch (word) values ('Temporary'),('tables'),('are'),('used'),('by'),('every'),('DB'),('developer,'),('but'),('they''re'),('not'),('likely'),('to'),('be'),('to'),('adventurous'),('with'),('their'),('use'),('or'),('exploit'),('all'),('their'),('advantages'),('They'),('can'),('improve'),('your'),('code'),('performance'),('and'),('maintainability'),('but'),('can'),('be'),('the'),('source'),('of'),('grief'),('to'),('both'),('developer'),('and'),('DBA'),('if'),('things'),('go'),('wrong'),('and'),('a'),('process'),('grind'),('away'),('inexorably'),('slowly'),('We'),('asked'),('Phil'),('for'),('advice'),('thinking'),('that'),('it'),('would'),('be'),('a'),('simple'),('explanation')
      CREATE table #WordsToMatch (Word varchar(40))
      INSERT INTO #WordsToMatch SELECT word  FROM @wordstomatch
      --and also put them in a table variable
      /* now we'll run all four possible combinations of temporary table
       and table variable and see what sort of plan is being generated
       in each case */
      SET STATISTICS PROFILE ON
      SELECT COUNT(*) FROM @words commonwords 
        INNER JOIN  @wordsToMatch intro ON  commonWords.word=intro.word
      SELECT COUNT(*) FROM #words commonwords 
        INNER JOIN  @wordsToMatch intro ON  commonWords.word=intro.word
      SELECT COUNT(*) FROM @words commonwords 
        INNER JOIN  #wordsToMatch intro ON  commonWords.word=intro.word
      SELECT COUNT(*) FROM #words commonwords 
        INNER JOIN  #wordsToMatch intro ON  commonWords.word=intro.word
      SET STATISTICS PROFILE OFF

      /* now we'll run a simple set of timings just to see how long each join took (in Milliseconds) */
      dbcc freeproccache

      DECLARE @log TABLE (
        Log_Id INT IDENTITY(1, 1),TheeVent VARCHAR( 2000 ),
        DateAndTime DATETIME DEFAULT GetDate()) ;
       INSERT INTO @log (TheEvent) SELECT 'both table variables';
       SELECT COUNT(*) FROM @words commonwords 
        INNER JOIN  @wordsToMatch intro ON  commonWords.word=intro.word
       INSERT INTO @log (TheEvent) SELECT 'large table variable joined to small temp table';
      SELECT COUNT(*) FROM #words commonwords 
        INNER JOIN  @wordsToMatch intro ON  commonWords.word=intro.word
       INSERT INTO @log (TheEvent) SELECT 'Large temp table small table variable';
      SELECT COUNT(*) FROM @words commonwords 
        INNER JOIN  #wordsToMatch intro ON  commonWords.word=intro.word
       INSERT INTO @log (TheEvent) SELECT 'Both temporary tables';
      SELECT COUNT(*) FROM #words commonwords 
        INNER JOIN  #wordsToMatch intro ON  commonWords.word=intro.word
       INSERT INTO @log (TheEvent) SELECT 'end';

      SELECT
        TheStart.TheeVent + ' took '
        + CAST( DatedIff( ms, TheStart.DateAndTime, Theend.DateAndTime ) AS VARCHAR( 10 ))
        + ' Ms.'
      FROM   @log TheStart
        INNER JOIN @log Theend
          ON Theend.Log_Id = TheStart.Log_Id + 1;

      DROP table #WordsToMatch
      DROP table #Words
[1]: http://www.simple-talk.com/community/blogs/philfactor/attachment/2117.ashx
more ▼

answered Sep 24, 2011 at 03:25 PM

Phil Factor gravatar image

Phil Factor
3.8k 8 9 16

(comments are locked)
10|1200 characters needed characters left

1) guidance from Microsoft is varied and problematic. All one need do is examine the database structures and code of Sharepoint to understand that there are lots of substandard things going on SQL Server wise within Microsoft. Oh, take a gander at the myriad of issues that come along with TSQL UDF usage too. :-) And how about the hugely outdated 1MB growth fragment size default for database files (and more importantly lack if exceptionally stressed guidance to NOT keep that for any database), 5 for cost threshold for parallelism, 1 file per cpu core for tempdb guidance, etc, etc.

2) Voiding parallelism for DML activity involving table vars has essentially nothing to do with the TVP itself. It is with that massive 6 table 50M row hitting SELECT statement that is populating it. THAT is where the perf problem comes into play.

3) But the simplest thing to keep in mind is that you are GUARANTEED to get SEVERAL bad things happening some/most of the time when you use table vars over temp tables as I have said and demonstrated. So the guidance should always be if you are not seeing the two reasons I said they should be used for (recompilation perf issues or the very esoteric auditing need for rolled-back transactions) then don't use them. You just have the guidance backwards is all.

4) BTW, I am surprised and disappointed that a tech reviewer for the article didn't call you out on some of the issues I brought up.
more ▼

answered Sep 24, 2011 at 07:19 PM

KGB gravatar image

KGB
21

We'll talk about the tech review at the Summit.
Sep 25, 2011 at 03:54 AM Grant Fritchey ♦♦
1) and the MS CRM db and Nav and GP ... ad finem
Sep 25, 2011 at 10:10 AM Scot Hauder

Your routine is interesting. It is great to start at the point where things have gone wrong and work outwards. If, for example, you change the Select * to Select count(*), then the plans are identical good index seek nested loop join plans. If you add a bit more variation in the key column by doing ...

  select number/10000, replicate(' ', 500)  

...then both plans are identical. Similarly, by deskewing a bit by doing ...

  insert #BigTest
  select number/10, replicate(' ', 500)
  from tallyNumbers 
  where number between 2 and 1000

... then both plans are identical.

It shows, I guess, what can go wrong occasionally when there aren't distribution statistics for one or both tables, and the other one has a skewed distribution; but it also shows that the skew has to be pretty extreme, and normally, if the developer is 'with it', the temporary table will be an extract or aggregation from the table data, and so unlikely to be anything quite as skewed. However, real data does occasionally cause problems. We've all experienced Perfectly Poisonous Table Variables in real live data and occasionally I haven't been able to puzzle out an explanation.
Sep 26, 2011 at 03:48 AM Phil Factor
(comments are locked)
10|1200 characters needed characters left

I have come across a few instances where TVs have been employed in code and have cringed when I see the est row count going into them. Usually re-writing the code to use a #table has brought significant improvements. I cant say every time as i havent recorded the occurrences.

I now recommend in the office to write options using both and compare but to expect anything with more than a handful of records in a TV to be the slower/worse option.

Christian Bolton was speaking at our user group a few weeks ago and in his parallelism and CXPacket wait session he suggested that seeing performance gains with TVs comes when they are used for few records - OK for months of the year and so on but when the record count goes up the performance drops of and the #table is favoured.
more ▼

answered Sep 25, 2011 at 01:39 PM

Fatherjack gravatar image

Fatherjack ♦♦
42.3k 75 78 108

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x22
x19
x15

asked: Sep 24, 2011 at 02:54 PM

Seen: 8116 times

Last Updated: Oct 28, 2013 at 09:06 PM