This is an interesting problem. From the application developer view we must archive because the database will be quicker with fewer records.
From a business view, they want to see the old data for as long as possible. Archiving might remove the those records and some other means have to be build to access the archived records (view only)
The DBA want to satisfy both. Running 2005 Standard I have looked at partitioning but recently read it cant be implemented with Standard only for Enterprise edition. (Correct me on this)
I looked at moving the archived records to a second data base with reporting build on it for accessing the old data. (This sounds like a lot of work)
We will upgrade to 2008 standard before Feb next year so that might influence the solution.
What I would like to hear is ideas on how you have solved this dilemma.
Answer by TG ·
Age old story - I want it all and I want it quick!
I think partitioning is on the right track to "having both" with the least effort. Depending on the nature of the data and the queries, a well selected clustered index can have a similar effect to partitioning and be available on your current version of sql server. So Table/Index/Query optimization may get you all the way there.
Archiving can be done without forcing the user to look for older data with a seperate operation. A UNOIN ALL between current an old tables can be efficient, again, assuming the tables are indexed such that if there is no data in the archive then that portion of the UNION will be a virtual no-op. (edit: I guess the only reason this would be desirable is if there was separate options for searching "recent" data and "all" data.)
Another possibility is similar to your reporting database idea is to create seperate storage table(s) which would include all data (not just old data) but would be optimized for retrival. This is the concept of a data warehouse. That could fall into your category of "a lot of work" but it's proven effective.
Answer by mrdenny ·
How much data are you dealing with. Archiving may not be needed. For example my OLTP database has tables with 250+ Million records in it with sub 3 second response time (when being joined to other multi-million row tables.
You may just need to adjust your indexes, and/or add a little RAM to the server.
That said, if you do truly need to archive, something I did which worked well was create an Archive database on the same server, and move the old records to that database. Then setup views which can be hit (which have the UNION ALL which TG talked about in them) and hold all the data. It does require a bit of application logic change so that writes go to the old table, but if all your data access is done via stored procedures then you should be able to handle all the needed changes at the database level without any changes to the client application.
Answer by K. Brian Kelley ·
Yes, partitioning requires Enterprise Edition in SQL Server 2005 (features comparison). This is unchanged in SQL Server 2008. Given that you're going with Standard Edition, I would set up an archiving routine to a second database and build views the end users can use which union the current data and the archived data. The reason I would use a second database is that keeps your operational database relatively small in comparison. That impacts your backup and recovery times, as well as your disk and tape utilization if you're keeping multiple days/copies of backups.
Answer by Benjamin ·
Another option we have in SQL Server 2008 is the use a filtered index (index with a WHERE clause). This can help keep the index size smaller while providing speedy access to the most recent 30 days worth of data, for example. The rest of your data is still available but retrieval would be slowed. Also you could use a Indexed View to join the archive and live data should you choose to separate the data that way.
One final thought on archiving which is changed by SQL Server 2008 is compression. You could choose to compress your archived data and keep your active data uncompressed. This way you may be able to keep more data accessible on the same system. You will just chew up CPU cycles to decompress it. All of these ideas depend on the actual data and data consumption patterns.
Answer by Steve Jones - Editor ·
I tend to agree with Mr. Denny. The number of records doesn't necessarily impact too much the query response time, depending on the queries and indexes. Indexes can make a 10,000,000 row table respond with a single record as fast as a 10,000 row table.
That being said, I tend to agree with archiving for a few reasons. If you don't need the data regularly, or you can get by with reporting in a second db, then you reduce your backup times and restore / recovery times. The archived data is often read-only, and that means you can back it up periodically, and have a copy off-site for recovery.
But be sure that you can report on things properly. If lookup data changes, do you have a process to ensure that it gets to the archived data if it's needed? sometimes you need to rewrite history for reporting purposes as the business evolves. It's not as simple as just moving part of a large table to another database or server.