question

GPO avatar image
GPO asked

Data warehousing latency minimization strategies

Hi All We have a strategy here of reporting off a replication of the live database. The idea is that when something changes on live, it can usually be reflected in reporting in a few seconds. I'm not sure this is the best strategy and it will become less appealing as time goes on. A few of our reports are starting to run very slowly because the data are optimized for transactions, not analysis. I'd like to be able to move a lot of our reporting to cubes but there's a perception here that there is too much latency between the cubes and the source transactional data (say a day to a week). The reality is that for 90% of reporting purposes here, they don't need the reporting data to be current to the minute, but it's like some sacred cow that they won't relinquish. I'm never going to win that argument. Where I might have some leverage though is if I could say that the cubes will be refreshed with minimal overhead every, say hour or two hours instead of once a week or overnight. Where can I get some good doco on how to reduce the lag between the cube and live data? What are the tradeoffs associated with keeping cubes current? What are your experiences?
performancessasdata-warehousecubes
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
WilliamD avatar image
WilliamD answered
GPO, keeping cubes current is the bane of most dbas/devs that have to look after a BI implementation. You say that reporting runs currently against the OLTP system and this is causing performance problems. What would speak against setting up transactional replication to replicate all required tables to another machine. This would allow you to offload the reporting, freeing up the OLTP and allowing you to add indexes that support reporting to the replication target system. We use this setup quite successfully, but are looking at the next release of sql server for further improvments through database mirrors that are readable.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

WilliamD avatar image WilliamD commented ·
Writes to the replicated target can be slowed down by your read access of the data. Basically the replication is performing all write operations coming from the source (almost like a replay of the production system). If you have additional indexes, these will change how the replication writes work. By default, the indexes from the OLTP system will not be present on the replication target, so the replication writes will be running differently to the prod system anyway. Nolock will give dirty reads, but lower blocking. Depending on how heavy the load is on the replication target, this may benefit you. It depends on how willing you are to work with dirty pages (business decision, not personal opinion). We do not use nolock for the reason mentioned. We want clean data - that is a business decision, rather a little slower but right, than quick and maybe wrong. I suggest looking at the queries you have, the indexes on the replication target and the missing index dmvs to speed up those queries. Your indexing strategy will probably be different compared to the OLTP system, so don't necessarily take those indexes as a template.
1 Like 1 ·
GPO avatar image GPO commented ·
@WilliamD. Thanks for taking the time to respond. No, I said we're already reporting of replicated live dbs. Because the data is not set up for reporting, the reports are generally a lot slower than they could be (and a lot slower to write for any newbies coming in too). Incidentally, (off topic I know) we're told to put (nolock) hints on all our queries that run off the replicated data because slow queries are (allegedly) a hindrance to replication. I have no idea whether this is correct, but forcing dirty reads on every query would seem to make a mockery of the conflicting desire to have data accurate up to the minute. Do you use (nolock) hints when reporting off your replicated data? Also incidentally, do large indexes slow down replication? How would I test this?
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.