I’m using row_number() to insert distinct orders to the production table from the staging tables. I receive a lot of duplicate records and I only want the distinct records to be in the final table. Below is the cte I’m using to accomplish this task: with cte_orders as (select *, ROW_NUMBER() over (partition by ordno order by ordno) as rownum from Stage_Orders) select * from cte_orders where rownum=1 This is taking a long time to compile. I looked at the execution plan and the sort cost is 48% and table scan cost is 42%. I don’t have index on the staging table because I update that table a lot. Any suggestion on how I can improve the performance? Thanks
One issue with your table is that you don't have a clustered index as you get a table scan. In my opinion, every table should have at least a clustered index. You can't create a table in SQL Azure without adding a clustered index. A table without a clustered index is an unsorted hash that you need to scan and very often sort to use. To improve performance of over clause you should add an index on POC columns, where P is partitioning column, O is order by columns and C is other columns you use in the Query. P and O should be index columns and C should be included columns.
I agree with Håkan. Depending on how you update the existing rows in the staging table, you might or might not want to just add an identity column primary key on it. If you insert more often than you update, or if you update doesn't change the total number of bytes per row, you can have a clustered identity column without causing fragmentation. I wouldn't be too afraid of adding indexes to the staging table. If you have massive loads from SSIS or the likes, you might want to drop indexes, load staging table and rebuild indexes. With enterprise edition, you could look at partition switching and use a "pre staging" table. If your performance problem today is with selecting from the staging table, I would suggest indexing the staging table and test performance on loading and updating rather than assuming indexing the staging table will hit your performance.
> But wouldn't creating index will slow > down the insert or updates? I know I > can disable the index before the > insert and then re-build it again but > disabling and rebuilding the index > takes long time for a large table. I > usually insert around million records > and the table already has 150 million > records. @liton I realize I'm responding to an older post but let's back the performance truck up for a minute. Are you saying that your 150 million row table does NOT have a clustered index??? The other thing I want to ask is about your original post. You're all worried about some large % of cost numbers in the execution plan but how long does your query, which is patently on the million row staging table, actually take??? On my humble laptop, your query only takes 4 seconds on a million rows.