question

Magnus Ahlkvist avatar image
Magnus Ahlkvist asked

Partitioning existing table

Hi all, I have a table which I load every month with new incoming data. The loading of data is done in an SSIS-package, which disables non clustered indexes (except UNIQUE indexes) on the table, loads the data and then rebuilds non clustered indexes. The table has grown quite a bit and therefore the performance on rebuilding indexes gets worse by the month. Loading data takes 20 minutes and rebuilding indexes takes six hours. I'm looking at solving this by partitioning the table, loading monthly data to a staging table and switching it into the main table. I don't really have any problems with creating the partition function and scheme etc, but I'm kind of stuck on the clustered index. Right now, I have an identity column as PK and clustered index. I also have a unique, non clustered index on date+code. I did have date+code as primary key initially, but that caused too many page splits, because the data source is an text-file which is not in any way sorted by code. When I put the table on to the partition scheme, I understand I'll have to drop the clustered index and rebuild it on the partition scheme. Then I'll need the date in the clustered index. One idea I have is to make the primary clustered key consist of ID+date. That would solve the page splits, and it would also help me in that I don't have to find out which identity seed to use in the loading job. But I'm thinking I'm not the only one ran into this scenario. Any nice suggestions on how to continue? I don't really want to laborate too much with it, as rebuilding the clustered index will take me more or less forever :) Therefore, I'd like to have you input before I decide which way to go.
sql-server-2008partitioning
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jeff Moden avatar image Jeff Moden commented ·
I'm actually confused a bit by your post, Magnus. Is the ID column the IDENTITY column? If so, you should NOT be getting any page splits on the related Clustered Index because the IDENTITY column is both UNIQUE and ever-increasing. Shifting gears a bit, rebuilding the clustered index on this table will not only take hours, but it initially increase the size of the dataspace used by the table by 120% because it has to make a copy of either the heap or the existing CI and won't release any of the old one until the new one is formed. Before I make a recommendation in this area, I have to ask, will any of the data in previous months ever (and I do mean EVER) change?
0 Likes 0 ·
Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
Hi Jeff. I Don't currently have page splits problems. And I have made and tested my changes to dev-server. I made ID+Date clustering key and date partitioning key. There are revisions to historic data. It's not causing updates to existing rows but rather inserts. We did consider putting all current data in one partition, but we'd still have to rebuild clustered index on the partition scheme, so we made each month its own partition.
0 Likes 0 ·

1 Answer

·
Håkan Winther avatar image
Håkan Winther answered
I would test my partitioning keys in a new (empty) table to find out what's the best key. How large is your file? It doesn't sound that big if it takes 20 minutes to load. The best option would be to load the data into a staging table with the same structure as the target table, except the non-clustered indexes, then add the non-clustered indexes. When this is done, just switch the data in with fast partition switching, it will take fractions of a second as it's a schema change and not a DML operation.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
Thanks Håkan. The file isn't too big - it's some 8 million rows, 30-40 columns. Loading into a staging table is what I'm looking at doing, and to switch the data into the main table. My thinking about non clustered indexes etc is exactly as you are pointing out. But my problems are around the clustered index, as I'm today having an identity column as clustered index, and when I partition the table, I'll have to put the partitioning column into the clustered index. But I've given it some thought over the day, and that's probably not a problem - If I cluster the table over Date+identity, I won't have any problems with fragmentation anyway.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.