I want to develop a process for refreshing some reporting tables. For the minute it’s enough to assume that we have a fact table that gets refreshed every night. It takes about 30 seconds, during which, if you ran a report off the table it would have zero rows in it and your report would give you the wrong answer. This is clearly unacceptable. To move away from this, I want to know whether there are risks associated with the following strategy (assume my fact table is just called MyFacts for the purposes of this discussion): 1. Do a SELECT...INTO to create a staging table (MyFactsStg) that contains all data of correct type and nullability (but no indexes or keys yet) 2. Rename the indexes on MyFacts to *Old 3. Add primary key and create indexes on MyFactsStg using the object names that were previously on MyFacts (but which have since been renamed to *Old. This (MyFactsStg) is now in terms of DDL, an exact copy of MyFacts, but with more current rows 4. Rename table MyFacts to MyFactsOld 5. Rename table MyFactsStg to MyFacts 6. Drop table MyFactsOld Taking it at face value, it seems the only downtime for the users will be a few milliseconds between steps 4 and 5. As far as my testing goes, if someone is using MyFacts while I’m trying to rename it, the rename sproc appears to wait patiently until it has exclusive access to the table and then renames it, so it appears all but seamless to the end user. What worries me is that I didn’t read about this strategy in a book so it’s probably full of gotchas that I have not thought of yet.
The only downside I can see with your approach is the complexity. Suddenly adding or removing an index on MyFacts is a breaking change and requires changing the script that handles the re-name. There is also the issue of recovering from failure to consider. If the ETL has half run and then failed, the next run needs to be able to recognise this and tidy up before starting. I think your idea is valid, I would suggest looking at SNAPSHOT ISOLATION as well, it may or may not make things simpler in your situation. You could turn on read committed snapshot, then all transactions that are running at read committed isolation would in fact use snapshot isolation instead. Basically SQL server will ensure they see a transactionally consistent snapshot of the data from the point in time that they started. This CAN require significant tempdb space but we haven't found that to be the case. The only caveat is that you really need the entire delete / refill process on MyFacts to run in a single transaction. We use this method to update our gazeetter, the updates are staged into staging tables in the destination database, and then a SQL Script handles the final ETL from the staging tables into the main table as a single transaction. We couldn't sensibly use the stage / rename technique becuase we are only changing a fairly small percentage of the data and creating an entire copy and then re-indexing it just to apply the changes to would be pretty wasteful.
This doesn't have to be that complex. 1 CREATE your staging table with the same columns/types as the destination table, don't worry about indexes or anything else. Do not use SELECT..INTO, use CREATE TABLE statement instead. 2 INSERT the refreshed data into the StagingTable 3 If step 2 succeeds do something like this: BEGIN TRAN DELETE FROM ReportTable WITH (HOLDLOCK, TABLOCKX) INSERT ReportTable SELECT * FROM StagingTable COMMIT TRAN Holding an exclusive lock for the duration of the transaction will block any reads from reports and they will simply wait for the transaction to complete.