x

Updating an index with -1 or 9999, what is faster ?

  • Scenario: A 100M record table A nonclustered index orver an integer column ordered forward. ( but could be a clustered index for any purpose).

  • Action: Set this record as inactive by either updating its value to -1 or 9999 For test purposes lets say 50% of the data will be updated.

I know that a huge update like 50% of the data will cause an index fragmentation.

The table ( when clustered ) and the index ( nonclustered ) stores data "ordered", lets say, from 0 to 10.

So, does updating it to -1 will cause all remaining values to be moved on the index and be slower then updating it to 9999 ( or max int size )?

What is better/faster update it to -1 or 9999 ? or it doen't matter ? AND if we consider the defragmentation of the index as a required step ?


Edit: A news question: Wont the update to the maximum value ( or 9999 ) affect further work with the table, for example, if 50% of the data is on the last pages of the table, any include/update will be in the middle of the data, so any pages after that could be affected...


Edit2: I performed the following tests:

set nocount on
go
drop table temp
select convert(int,id) as id into temp from BIG_TABLE
create index IDX_id on temp(id )
GO

dbcc dropcleanbuffers
dbcc freeproccache
set statistics io on
set statistics time on
    update temp set id = -1
    where id > 500000 and id < 999999999
    select @@rowcount
set statistics io off
set statistics time off

    declare @c int = 0
    while @c < 100000
    begin
    	insert temp(id) values ( @c )
    	select @c = @c + 1
    end
    select @c

set statistics io on
set statistics time on
    ALTER INDEX IDX_id ON temp REORGANIZE 
set statistics io off
set statistics time off
GO

drop table temp
select convert(int,registration_id) as id into temp from BIG_REGISTRATIONs
create index IDX_id on temp(id )
GO

dbcc dropcleanbuffers
dbcc freeproccache
set statistics io on
set statistics time on
    update temp set id = 999999999
    where id > -1 and id <= 500000
    select @@rowcount
set statistics io off
set statistics time off

    declare @c int = 500000
    while @c < 600000
    begin
    	insert temp(id) values ( @c )
    	select @c = @c + 1
    end
    select @c - 500000

set statistics io on
set statistics time on
    ALTER INDEX IDX_id ON temp REORGANIZE 
set statistics io off
set statistics time off
GO

But i got inconclusive results, the difference was in miliseconds on both update and reorganize index even when a lot of insertions appeared in the middle of the pages.

I am starting to guess there is no difference between minimum value ( or -1 ) and maximum value ( 9999 ) when updating an indexed column, What do you think ?

more ▼

asked Oct 22, 2009 at 12:07 PM in Default

Gustavo gravatar image

Gustavo
592 4 4 7

(comments are locked)
10|1200 characters needed characters left

5 answers: sort newest

Have you considered partitioned views? Partitioned tables?

more ▼

answered Oct 22, 2009 at 04:36 PM

Peso gravatar image

Peso
1.6k 5 6 8

I will perform some tests with partitioned tables and post here the results. Thanks.
Oct 27, 2009 at 08:46 AM Gustavo
Partitioned tables worked fine. Unfortunatelly our production server is not 2008, this way Matt's idea of filtered index would be better. Anyway, just to report, it seems there is no difference between what number you update, since both options have its drawbacks.
Nov 04, 2009 at 02:17 PM Gustavo
Filtered indexes are a SQL Server 2008 feature. Partiotioned tables works in SQL Server 2005 as well as SQL Server 2008. However, you need the Enterprise Edition.
Nov 20, 2009 at 10:12 AM Peso
(comments are locked)
10|1200 characters needed characters left

If the index is nonclustered, then I recommend using a filter to exclude inactive records. That way, your index won't even include those values and thus will perform faster for active records.

If you don't filter the index, then I think a very high number (beyond reach of valid values) would be better because it might help with certain queries that perform an index scan. I have no real knowledge or testing experience, but that is my gut feeling.

Edit: different approach

Is there a reason you need this inactive data? If so, is there a reason it can't be moved to a different table? I know this can cause a problem with referential integrity and foreign keys, but I suggestion that might work even if that is the case. This may or may not be worthwhile given your situation.

  • stub table for maintaining foreign keys
  • active records table with foreign key to stub table
  • inactive records table with foreign key to stub table
  • view with stub table and outer joins to include active and inactive records

I think most queries could remain unchanged if the current table name is used for the active records. For those queries that need to check just inactive records, change to that table. For queries that need to check both, use the view. Obviously, if you need to check both a lot, this is not a good solution because it will mean using an outer join view a lot.

We did something like this on a project a few years ago, and it worked quite well. (I just hope I'm not forgetting any details we discovered along the way.)

more ▼

answered Oct 22, 2009 at 03:33 PM

Tom Staab gravatar image

Tom Staab
5.8k 6 8 10

Thanks Tom, i cant used filtered index because my production environment is SQL server 2005, but this is a good advice anyway. But doesn't a very big number affect the current pages if more and more data will be added and the index pages of the deleted values would need to be reoganizaed frequently ?
Oct 22, 2009 at 03:52 PM Gustavo
You would need to consider the expected growth rate of the inactive records. I edited my answer to include a new potential solution. Please let me know what you think.
Oct 22, 2009 at 04:13 PM Tom Staab
I am studing partitioning this table right now, i will perform some tests but as far as i could see the query must indicate what partition to query so it will not be as soft as your view aproach but requires a less structures.
Oct 27, 2009 at 08:38 AM Gustavo
(comments are locked)
10|1200 characters needed characters left

If you are on SQL Server 2008 then you can create a filtered index - which would ignore values that didn't match the filter. Therefore when you updated the rows, they would simply be removed from the index, which, while leaving spaces, wouldn't fragment it any further. This is, of course, assuming that you don't need the inactive rows to be accessed (i.e. you don't have a query that says 'select the inactive rows').

You would then find that the cost of re-organising that index afterwards was significantly reduced.

more ▼

answered Oct 22, 2009 at 03:25 PM

Matt Whitfield gravatar image

Matt Whitfield ♦♦
29.4k 61 65 87

Unfortunatelly our production server is 2005, but i agree that it would really be the best solution if "disabled" data is not acesses frequently...
Oct 22, 2009 at 03:30 PM Gustavo
(comments are locked)
10|1200 characters needed characters left

I can't say for sure if -1 or 9999 is faster (but probably 9999). But i would say that this is theoretical.

If it is a non clustered index, i would drop the index and then update the column and then add the index with ONLINE=on an SORT_IN_TEMPDB=on, but then you probably say that you cannot get a service window to do this update. If thats the case, you probably can't do your update anyway, due to the fact that your update will lock the entire table during the transaction. Of course, you can prevent lock escalation, but that will slow you down even more, i would even force a tablock to get the best performance.

But then you say, I use a where condition on this column and need the index, but the index will not help you if as much as 50M out of 100M will be updated, and SQL will do a table or index scan anyway.

more ▼

answered Oct 22, 2009 at 01:58 PM

Håkan Winther gravatar image

Håkan Winther
15.6k 35 37 48

Actually this is an hipotetic situation, i do have a status column and i often update values to -1 when they are disabled. The 50% was just a worst case hipotesis. Just would like to know if there is any proof theory about what is better, throw data to the beggining of the index or to the end.
Oct 22, 2009 at 02:57 PM Gustavo
(comments are locked)
10|1200 characters needed characters left

I'm staring at Kalen Delany's book on the SQL 2005 storage engine, but I don't recall anything in it precisely on point. I suspect that it would be slightly faster to set to 9999 because as rows were moved to new pages, the new pages could be populated without having to update pointers to the first page starting with 0 in your subject column.

In other words, after the first page of -1s is populated, it would point to the first page containing 0's as the next page. When a second page for -1s is created, the first page of -1s might have to be altered to point to the second page of -1s rather than the starting page for 0s. The new page of -1s would point to the first page of 0s.

However, I'm not sure whether or not such an update would have to happen for each new page, or just for the first page of 9999s. Also, I would expect both pages involved to be in ram at the time of the update, so the performance difference should be miniscule.

Fragmentation should be a wash, because I wouldn't expect page splits to occur on the new pages with either value, and the old pages would be fragmented identically. However this is all speculation.

My strong suggestion is to create a sample table with a million rows or so rows and conduct some time trials. Nothing beats testing to get a definitive answer.

more ▼

answered Oct 22, 2009 at 01:19 PM

Bob Hovious gravatar image

Bob Hovious
1.6k 5 6 9

I performed the tests and edited the main post to include them but unfortunatelly i got no conclusive results.
Oct 27, 2009 at 08:40 AM Gustavo
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1950
x987
x251

asked: Oct 22, 2009 at 12:07 PM

Seen: 2116 times

Last Updated: Nov 20, 2009 at 09:03 AM