question

Gustavo avatar image
Gustavo asked

Updating an index with -1 or 9999, what is faster ?

  • Scenario: A 100M record table A nonclustered index orver an integer column ordered forward. ( but could be a clustered index for any purpose).

  • Action: Set this record as inactive by either updating its value to -1 or 9999 For test purposes lets say 50% of the data will be updated.

I know that a huge update like 50% of the data will cause an index fragmentation.

The table ( when clustered ) and the index ( nonclustered ) stores data "ordered", lets say, from 0 to 10.

So, does updating it to -1 will cause all remaining values to be moved on the index and be slower then updating it to 9999 ( or max int size )?

What is better/faster update it to -1 or 9999 ? or it doen't matter ? AND if we consider the defragmentation of the index as a required step ?


Edit: A news question: Wont the update to the maximum value ( or 9999 ) affect further work with the table, for example, if 50% of the data is on the last pages of the table, any include/update will be in the middle of the data, so any pages after that could be affected...


Edit2: I performed the following tests:

set nocount on
go
drop table temp
select convert(int,id) as id into temp from BIG_TABLE
create index IDX_id on temp(id )
GO

dbcc dropcleanbuffers
dbcc freeproccache
set statistics io on
set statistics time on
    update temp set id = -1
    where id > 500000 and id < 999999999
    select @@rowcount
set statistics io off
set statistics time off

    declare @c int = 0
    while @c < 100000
    begin
    	insert temp(id) values ( @c )
    	select @c = @c + 1
    end
    select @c

set statistics io on
set statistics time on
    ALTER INDEX IDX_id ON temp REORGANIZE 
set statistics io off
set statistics time off
GO

drop table temp
select convert(int,registration_id) as id into temp from BIG_REGISTRATIONs
create index IDX_id on temp(id )
GO

dbcc dropcleanbuffers
dbcc freeproccache
set statistics io on
set statistics time on
    update temp set id = 999999999
    where id > -1 and id <= 500000
    select @@rowcount
set statistics io off
set statistics time off

    declare @c int = 500000
    while @c < 600000
    begin
    	insert temp(id) values ( @c )
    	select @c = @c + 1
    end
    select @c - 500000

set statistics io on
set statistics time on
    ALTER INDEX IDX_id ON temp REORGANIZE 
set statistics io off
set statistics time off
GO

But i got inconclusive results, the difference was in miliseconds on both update and reorganize index even when a lot of insertions appeared in the middle of the pages.

I am starting to guess there is no difference between minimum value ( or -1 ) and maximum value ( 9999 ) when updating an indexed column, What do you think ?

sql-server-2005t-sqlperformance
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Peso avatar image
Peso answered

Have you considered partitioned views? Partitioned tables?

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bob Hovious avatar image
Bob Hovious answered

I'm staring at Kalen Delany's book on the SQL 2005 storage engine, but I don't recall anything in it precisely on point. I suspect that it would be slightly faster to set to 9999 because as rows were moved to new pages, the new pages could be populated without having to update pointers to the first page starting with 0 in your subject column.

In other words, after the first page of -1s is populated, it would point to the first page containing 0's as the next page. When a second page for -1s is created, the first page of -1s might have to be altered to point to the second page of -1s rather than the starting page for 0s. The new page of -1s would point to the first page of 0s.

However, I'm not sure whether or not such an update would have to happen for each new page, or just for the first page of 9999s. Also, I would expect both pages involved to be in ram at the time of the update, so the performance difference should be miniscule.

Fragmentation should be a wash, because I wouldn't expect page splits to occur on the new pages with either value, and the old pages would be fragmented identically. However this is all speculation.

My strong suggestion is to create a sample table with a million rows or so rows and conduct some time trials. Nothing beats testing to get a definitive answer.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered

I can't say for sure if -1 or 9999 is faster (but probably 9999). But i would say that this is theoretical.

If it is a non clustered index, i would drop the index and then update the column and then add the index with ONLINE=on an SORT_IN_TEMPDB=on, but then you probably say that you cannot get a service window to do this update. If thats the case, you probably can't do your update anyway, due to the fact that your update will lock the entire table during the transaction. Of course, you can prevent lock escalation, but that will slow you down even more, i would even force a tablock to get the best performance.

But then you say, I use a where condition on this column and need the index, but the index will not help you if as much as 50M out of 100M will be updated, and SQL will do a table or index scan anyway.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

If you are on SQL Server 2008 then you can create a filtered index - which would ignore values that didn't match the filter. Therefore when you updated the rows, they would simply be removed from the index, which, while leaving spaces, wouldn't fragment it any further. This is, of course, assuming that you don't need the inactive rows to be accessed (i.e. you don't have a query that says 'select the inactive rows').

You would then find that the cost of re-organising that index afterwards was significantly reduced.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Tom Staab avatar image
Tom Staab answered

If the index is nonclustered, then I recommend using a filter to exclude inactive records. That way, your index won't even include those values and thus will perform faster for active records.

If you don't filter the index, then I think a very high number (beyond reach of valid values) would be better because it might help with certain queries that perform an index scan. I have no real knowledge or testing experience, but that is my gut feeling.

Edit: different approach

Is there a reason you need this inactive data? If so, is there a reason it can't be moved to a different table? I know this can cause a problem with referential integrity and foreign keys, but I suggestion that might work even if that is the case. This may or may not be worthwhile given your situation.

  • stub table for maintaining foreign keys
  • active records table with foreign key to stub table
  • inactive records table with foreign key to stub table
  • view with stub table and outer joins to include active and inactive records

I think most queries could remain unchanged if the current table name is used for the active records. For those queries that need to check just inactive records, change to that table. For queries that need to check both, use the view. Obviously, if you need to check both a lot, this is not a good solution because it will mean using an outer join view a lot.

We did something like this on a project a few years ago, and it worked quite well. (I just hope I'm not forgetting any details we discovered along the way.)

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.