Hello, Recently, I noticed quite strange behaviour of execution plan for a query which processes data in batches ca. 10000 records. The query joins two very large tables (ca. 20 mln and 160mln) by using indexes. At the beginning query execution plan was perfect, database engine was using only index seeks and execution of the query was not longer that 200ms. When the batch process reached 90% of the records which have to be computed (ca. 20mln) query execution plan instantly changed and index operator on the bigger table started to use index scan, I do not have to mention that execution times increased 1000 times. I thought that statistic or indexes could not have been updated so I did both suitable thinks which came to my mind: update statistic and de-fragment of the index. It just helped for very short time. What else could I do to come back to previous efficient execution plan? I would like to add that database server (version 2000) is not overloaded by other heavy process, tables are not changed very often and the same batch process works fine on SQL Server 2008. Thanks in advance,
@Bartolo - inserting/updating 20 million records can and will change the data access methods SQL Server will use. The plan change is something that I would think to be normal in the circumstances you described. This sounds like a bulk-load operation that is not happening all the time. If this is right, I suggest you take a look at doing planned index and statistics maintenance immediately after the data load, to compensate for the data-skew that you are experiencing. SQL 2008 will handle this sort of thing much better than 2000, because the underlying storage and query execution systems have been redesigned. The redesign took this sort of thing into consideration.
Such large amount of data, could change the index selectivity and also the statistics. In your case it seems, that the selectivity is still good enough as the optimizer uses the index. If you you have defragmented the index and update statistics as you have wrote, then it seems, that the distribution of the values in statistics changed in a way, that optimizer now started to use index scan instead of index seek. It means, that the optimizer determined from the statistics, that the index scan will be less costly than repeated seeks. From the symptoms, it is problem of the statistics. If you have a very big amount of records in the table, then the inserts doesn't fire automatic statistics recompute for the table and the latest values are not covered by the statistics. Then in this scenario the SQL server also chooses index scan instead of index seek, as it doesn't know about th new values inserted. I suggest periodic statistics update.
Basically what's happening is that as data changes you get an automated update of the statistics. It works off a formula, 600 rows + 10% of the number of rows in the table. Once that value is hit, you get an automatic scanned update of statistics. This leads to a recompile of the stored procedure. The scan is a sampled scan, not a full scan. As the data changes, depending on the data in your table, you may or may not have a well distributed set of data. If it's not well distributed, you may see a bad plan created as the statistics are updated based on sampled data. If it is well distributed, the plan you get may be accurate for the new statistics, just not what you want.