(...I found some discussions on advantages of selecting from a view VS from a table, but did not find anything on advantages of selecting from a table VS a view...) Here is my story: a daily running SSIS package has the following SQL:
DROP TABLE xxx; SELECT * INTO xxx FROM view_xxx
The result is a multi-million rows table recreated daily. The latter statement executes for several hours and every time causes the TempDB to double its size. So, it's a problem. And it feels rather unusual, in general, to see a table created from a view. However, there is an agrument that in this way they buy advantages of the subsequent querying from the table compare to "executing a view on the fly". I would rather not argue with them. But I am generally-curious here: could there be ANY situations when this approach is performance-rewarding? I understand that much depends on indexes, including those of the view's underlying tables', but, in general...???
asked May 23, 2011 at 01:05 PM in Default
Just to add some more weight experiences to your list of answers. At the organization I work at I have inherited a number of systems that do just what you described in your question. I agree with @KenJ that having staging tables is a huge plus in many situations. Several of the daily tables that are "refreshed" in my organization are pretty well hammered during the day. Part of the process to refresh the data which sometimes is called by views performs various costly calculations on very large data sets. The same queries could be called by end users each time they wanted to access the data but based on some of the business logic from getting min, max, calculated values, max of calculated values etc, it is easier to have that data set already computed and the end users hit that data set. There have been numerous other cases where it has been better to create a few indexes and create a sproc for the users to use rather than views. I have encountered numerous situations where a user was calling a view, that was calling another view, that guess what - called another view. This leads to very sloppy code and expensive processes.
At the end of the day you just have to test and see if you can improve the process. A quote I love is "If you truly want to understand something... Try to change it!"
As with everything SQL, there is usually a time and place for everything, you just have to know when it is right.
answered May 23, 2011 at 01:49 PM
It is common to use staging tables during data loads, and they can help performance in some situations.
If the view, or its underlying tables are very complex or frequently updated during processing, then working off a current copy can improve concurrency for other processes in the database.
When you are doing complex or repetitive transformations on very large data sets, it can be faster to do them in staging tables inside the database rather than in an SSIS package. Especially if you would have to pull your large data set over the wire and try to work with it in memory.
Multi-step loads often make use of staging tables to perform logical units of work before pulling the final results together.
In addition to the raw performance improvements you can realize with staging tables, you can also see improved turnaround times in maintenance and development tasks, as your packages and processes can be reduced to a somewhat larger number of simpler steps, making them easier and more reliable to maintain.
answered May 23, 2011 at 01:32 PM
Yes, there are. If there weren't those situations, then indexed views wouldn't exist. And for the times at which indexed views won't cover the requirement - scheduled materialisation (such as the example you have found) is often used.
An example I have is where data from devices is associated with the location where the device was at the time - and this can be changed (thus the view). However, the indexed view allows it to be indexed by location, which is how the data is always queried. So yes, maintaining the view takes effort - but it is worth it.
I'm not sure 'daily' and 'executes for several hours' are two phrases that I would be keen to hear in connection with this topic, however. In that case it sounds like, maybe, the 'return on investment' in terms of CPU and I/O time isn't that great, if anything at all.
answered May 23, 2011 at 01:31 PM
Matt Whitfield ♦♦
I've used this technique myself, particularly when dealing with a view that contains a CTE. I even blogged about it too... http://thelonedba.wordpress.com/2011/05/10/t-sql-tuesday-018-ctes-theyre-not-always-good-for-you/
answered May 23, 2011 at 02:09 PM
There are a lot of good answers here, and I agree with everything said so far.
I will say it is worth carefully considering whether or not this makes sense. It definitely can under certain circumstances (I have used this technique myself on some occassions), but it can also be overused and use inappropriately. I would for instance consider if an indexed view is more appropriate or depending on how much the data changed if there is a way of doing an incremental update rather than a complete recreation.
One reason to do something like this that no one else has mentioned yet is for reporting purposes you may actually want the data to be fixed in time rather than accurate up to the second. I used to have a procedure that would would copy certain tables at midnight precisely so our analysts could examine the status as it was at midnight.
answered May 23, 2011 at 04:09 PM