Stewy avatar image
Stewy asked

How to reduce fragmentation on a clustered index

I have a database that is used to store data collected from a number of channels. It is real time data so each data point has an associated DateTime value. In its most basic form it it looks like this:

CREATE TABLE [dbo].[TestTable](
    [ChannelId] [int] NOT NULL,
    [DataTimeStamp] [datetime] NOT NULL,
    [Data] [float] NOT NULL

I have a clustered index on the ChannelId and DataTimeStamp to allow retrieval of data over a given time span. This works well for queries but the index fragments, and rebuilding is getting slower as the table grows in size.

I've looked changing the fill factor but I think this would be effective only if data was inserted with a random DataTimeStamp, rather than monotonically increasing.

Is there anything I can do to reduce the fragmentation? The only option I can see would be to create a separate table for each channel.

1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I should add that most of the time the data will be queried for a single channel with a specified date range.
0 Likes 0 ·

1 Answer

Matt Whitfield avatar image
Matt Whitfield answered

Change the clustered index to have DateTimeStamp as it's first key. You will still be able to query a channel for a given date range efficiently, with the added bonus that you will be able to query all channels for a given date range efficiently.

It will have a negative effect on querying for a particular channel with no specified date range, however.

I would still give it a fill factor appropriate to how your data is populated (i.e. if late data can be inserted then a fill factor would alleviate page splits in that scenario).

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Would it also be good to have the datetimestamp as desc?
0 Likes 0 ·
No I don't think so, because then new rows would want to go before existing rows, and pages naturally fill up forwards...
0 Likes 0 ·
This would have the bonus of making queries for all channels over a given date range efficient, but querying for data from a single channel within a date range would then be slower.
0 Likes 0 ·
Any difference would be totally marginal. You would still get a clustered index seek for a date range for a specific channel ID. As I said in the answer, the negative impact would be when searching for a specific channel ID with no date range.
0 Likes 0 ·
My concern was with the performance when retreiving data for a channelID over a large date range. There would certainly be a lot more disk IO as the data for each channel is spread throughout the table.
0 Likes 0 ·
I think you're probably at the point of test, test and test again. I would put together a test database with a copy of the data, index it both ways and see what happens. Let us know!
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.