Hi guys. I'm pretty noob on sql server tuning and maintenance. Hope you guys could help me with an issue that is causing some troubles on our application.
My company is running an hierarchical SQL Server 2008 R2 database with a TPT model. We got entities with a structure such as Entity -> Concept -> Operation -> Transaction -> Sale, where all of this tables are sharing only de Id from Entity with a FK.
Our performance problems came when we started to get millions of operations belonging to a small group of "scopes". Every Concept has a scope_Id and we have like 6 or 7 scopes on the database (a scope is representing an organization that operates inside the application), being just 3 of them the Big Stack Bullies, like:
Scope 1: 20M of operations Scope 2: 30M of operations Scope 3: 25M of operations Others: A few thousand operations on average
On the other hand, an operation has a sendDate. This column has a non-clustered index ordered descending. The scope_Id column on Concept has a similar index.
Then the most common queries on transactions have a where clause containing both columns. It uses to be fast if the query only asks by sendDate, but when it asks by sendDate and scope_Id, or just by scope_Id, is so damn slow and even produces disk queue depths making sort operations on tempDB.
I'm kinda sure that index on scope_Id is useless since just 3 entries contains a 90% of all records. I'm thinking about create an index on scope_Id and a datetime column similar to sendDate, but I'm not sure about this and can't simulate an environment like our production database due to its size and because is on a RDS instance from Amazon Web Services, so I can't get a bak file or generate a script big enough.
Any suggestion about how to improve queries on this situation will be so much appreciated. Thanks in advance.
asked Mar 23, 2016 at 10:07 AM in Default
Sounds like the skew of data could be a problem here - so for example a query against Scope2 is going to potentially read millions of rows, whereas the same query against ScopeX is a lot less. Estimating the number of rows a query will process is a key part of producing an execution plan, and SQL can choose to handle larger volumes differently than smaller ones. If these are being passed as parameters to stored procedures, then you could be getting sub-optimal execution plans due to parameter sniffing. Spilling out to tempdb is a common side-affect of a bad plan.
Is there a difference in query performance when you query for the Big Stack Bullies, vs querying against the smaller scopes?
The kinds of things I would be looking at are statement level recompiles, filtered indexes, filtered stats, partitioning - these may or may not help. If you can share an execution plan here for a badly performing query, we can suggest better alternatives.
answered Mar 23, 2016 at 11:41 AM
Kev Riley ♦♦