Indexing VarChar vs BigInt

This is a FACT Table in a Data Warehouse

It has a composite index as follows

ALTER TABLE [dbo].[Fact_Data] 
    [Column1_VarChar_10] ASC,
    [Column2_VarChar_10] ASC,
    [Column3_Int] ASC,
    [Column4_Int] ASC,
    [Column5_VarChar_10] ASC,
    [Column6_VarChar_10] ASC,
    [Column7_DateTime] ASC,
    [Column8_DateTime] ASC

In this structure, all of the varchar 10 columns have numeric values only. Is it going to be beneficial for me to change this 78 million row structure to hold BIGINT instead of VARCHAR in terms of querying and indexing?

Any other benefits / drawbacks that I should consider?

more ▼

asked Oct 21, 2009 at 05:20 PM in Default

avatar image

Raj More
1.8k 83 89 90

(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

Performance: BIGINT comparisons are faster using VARCHAR. But the big question is this: How are these columns compared to other columns or values? If, for instance, you frequently use [Column1_VarChar_10] for an inner join to another table with a similar VARCHAR column, then switching this one to BIGINT will hinder performance. If, however, the other table already uses an INT or BIGINT or other numeric data type, then converting this one would improve the performance of that query.

Storage: BIGINT uses 8 bytes. VARCHAR(10) uses 2-12 bytes. It's impossible to determine which is better without knowing more about the data (min, max, mean, median, etc.).

Sorting consideration: Since the fields are currently VARCHAR, they are sorted as strings. That is much different than numerical sorting. Any value beginning with a 1 ('100', '1000', '123456789', etc.) will be considered less than '2'. That is the way it is behaving today with the current index. If you change the fields to BIGINT, the sort order will change. This is true not only with the index but also with any query using an ORDER BY on one of those columns. This could have undesired effects on end user reports.

Additional suggestion: An 8-column index is heavy. In this particular example, your key size will be 32-72 bytes. Since you said you only have 78 million rows, what about creating a surrogate INT (identity) key? That would only be 4 bytes long. Keep in mind the clustered index dictates the sort order of the storage in the table itself. If one of the columns in your key has new values that fluctuate a lot, it will greatly degrade insert performance. The identity surrogate key allows for fast inserts because the new value is always greater than the current maximum value.

more ▼

answered Oct 21, 2009 at 07:45 PM

avatar image

Tom Staab ♦
14.5k 7 15 21

(comments are locked)
10|1200 characters needed characters left

Is it going to be beneficial for me to change this 78 million row structure to hold BIGINT instead of VARCHAR

I wouldn't bother. Instead, I'd be looking at having a much smaller set of keys for your Clustered Index. Remember that every non-clustered index must include the CIX key for every row, bloating your database significantly for very little gain.

Consider having a unique non-clustered index on those columns, but use a surrogate key of int (or bigint if you might need more than 2 billion rows one day) for your CIX.

Also... a 72-byte key will take a lot more non-leaf pages, increasing the cost of almost all your queries.

more ▼

answered Oct 21, 2009 at 11:20 PM

avatar image

Rob Farley
5.8k 16 22 28

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Oct 21, 2009 at 05:20 PM

Seen: 6021 times

Last Updated: Oct 22, 2009 at 09:30 AM

Copyright 2018 Redgate Software. Privacy Policy