question

DavidD avatar image
DavidD asked

char vs varchar index performance

Can anyone tell me if there is any performance degradation in using a non fixed length datatype as a key field? I've heard it thrown around by some database developers that we should be using char fields instead of varchar, which to me makes no sense at all. I would have thought the less bytes you store the better. But apparently index performance is much poorer on non fixed length fields i.e varchar. Can someone please confirm or correct this?
indexingdatatypes
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Cyborg avatar image
Cyborg answered
A simple practical experiment




CREATE  TABLE CharIndexTable
(NAME CHAR(10))

GO
CREATE INDEX IDX
ON CharIndexTable(NAME)

GO

CREATE  TABLE VarCharIndexTable
(NAME VARCHAR(10))

GO
CREATE INDEX IDX
ON VarCharIndexTable(NAME)

GO

SET NOCOUNT ON

INSERT INTO CharIndexTable
SELECT 'A'
GO 50000

SET NOCOUNT ON

INSERT INTO VarCharIndexTable
SELECT 'A'
GO 50000


No execute the following query and observe the page reads


SET STATISTICS IO ON

SET NOCOUNT ON

SELECT NAME FROM CharIndexTable 
SELECT NAME FROM VarCharIndexTable 

Table 'VarCharIndexTable'. Scan count 1, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CharIndexTable'. Scan count 1, logical reads 124, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Query against my VarCharIndexTable table took a logical read of 91 and for the CharIndexTable it took 124 The reason behind this is that the index key on VarCharIndexTable is a varchar, is a variable-length data type, the storage size of the varchar value is the actual length of the data entered ie in our case its 1 byte. In the case of CharIndexTable the index key is of CHAR, its a fixed-length character data type. In our case it will consume 10 byte key. The index page of VarCharIndexTable can accommodate more number of rows than that of CharIndexTable Index. ie for complete Index scan operation index on VarCharIndexTable needs to process 91 pages only, but in the case of CharIndexTable it is 124 page.
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Cool experiment. Data doesn't lie.
1 Like 1 ·
my thoughts exactly.
0 Likes 0 ·
i went to test this (with much more data) and get some time stats. Spent an hour trying to work out why the "char'd" table was quicker than the "varchar'd" table... until i looked at the query optimizer a bit closer and realised that parallelism was kicking in for the char table but not the varchar table. Turned the threshold down and with both query plans being identical (either with or with out parallelism), the varchar table was quicker than the char table. thanks for the feedback
0 Likes 0 ·
Tim avatar image
Tim answered
Gail Shaw has an excellent series on indexes on this site. Please read the following article. [Gail Shaw's Article]( http://www.sqlservercentral.com/articles/Indexing/68439/) If you are talking about your clustered index then yes you want it as small as possible.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Cream Some of the conditions for a clustered key candidate Gail suggests are: - The values are never changing - The (new) values are ever increasing Usually, it is not possible to bring a string data to comply with above. This means that any is generally speaking a horrible candidate for a clustered key, as it will lead to page splits more likely then not unless the fillfactor is carefully chosen and the index is reorganized/rebuilt frequently enough. If the evil deed really needs to be done then varchar is a better choice but ***only if the average length of varchar values is more than 2 bytes smaller than the size***. For example, suppose you consider varchar(10) versus char(10), but you know that the average length of the actual data is 9. In this case choosing varchar(10) is just plain stupid as the varchar will consume more space (9 bytes per data + 2 bytes overhead = 11 bytes, 10% greater than simply using char(10) instead) though admittedly, it is not as stupid as ever having some column of varchar(1) datatype (this one tops the list). If the average length of the actual data is 7 then by all means varchar is a superior choice. So, as always in SQL-Server, it depends!
1 Like 1 ·
ive read that article. like i wrote in the question - i thought less storage the better. I know this much already. I have a fairly solid understanding of how indexes work etc. Im trying to establish if there is any validity to the claim that char fields make for a better index column than a varchar column, in any way shape or form.
0 Likes 0 ·
WilliamD avatar image
WilliamD answered
You can do some really rudimentary testing to see the difference between a varchar and char column: My example uses a [Tally Table][1] to generate the test data. CREATE TABLE TableA (Col1 char(10) NOT NULL, PRIMARY KEY CLUSTERED (Col1)); CREATE TABLE TableB (Col1 varchar(10) NOT NULL, PRIMARY KEY CLUSTERED (Col1)); INSERT INTO dbo.TableA (Col1) SELECT RIGHT('0000000000'+CAST(t.n AS varchar(10)),10) FROM dbo.Tally T ; INSERT INTO dbo.TableB (Col1) SELECT RIGHT('0000000000'+CAST(t.n AS varchar(10)),10) FROM dbo.Tally T ; EXEC sp_spaceused 'TableA' ; EXEC sp_spaceused 'TableB' ; DROP TABLE dbo.TableA DROP TABLE dbo.TableB Using my Tally table (11000 rows), I get the following results: name rows reserved data index_size unused TableA 11000 264 KB 208 KB 16 KB 40 KB TableB 11000 328 KB 256 KB 16 KB 56 KB As you can see TableB (varchar) has 48KB more "data" than TableA (fixed length). This can be "levelled-out" if you have SQL Server Enterprise Edition and use data compression: CREATE TABLE TableA (Col1 char(10) NOT NULL, PRIMARY KEY CLUSTERED (Col1) WITH (DATA_COMPRESSION = PAGE)); CREATE TABLE TableB (Col1 varchar(10) NOT NULL, PRIMARY KEY CLUSTERED (Col1) WITH (DATA_COMPRESSION = PAGE)); This reduces the space used for both and effectively making the varchar column the same as a char: name rows reserved data index_size unused TableA 11000 200 KB 128 KB 16 KB 56 KB TableB 11000 200 KB 128 KB 16 KB 56 KB [1]: http://www.sqlservercentral.com/articles/T-SQL/62867/
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Damn, my post took me too long to format - Cyborg beat me to the punch.
0 Likes 0 ·
+1 for the effort.
0 Likes 0 ·
William i think you spend more time on formating, but looks cool!
0 Likes 0 ·
appreciate the great feedback
0 Likes 0 ·
romdane avatar image
romdane answered

Hi folks,

I want to add a concept of SELECTIVITY of the index based on varchar especially if it is varchar(128) and more and especially if the all the 128 char are provided (i.e. urls).

I suggest to index on a subset of chars that way whatever two rows they are different on their beginning, so we have to check which subset of chars gives unique rows which I will call the minimal cover selectivity.

Making the assumption that the index contains the value of the fields and the address it seems obvious that the size of the index will be huge if all chars are provided.

So indexing on the minimal cover selectivity will provide better performance.

Since SQL Server does not provide such functionality

CREATE [ CLUSTERED | NONCLUSTERED ] INDEX index_name

ON [ database_name . [ schema ] . | schema . ] table_name ( { column [ ASC | DESC ] } [ ,...n ] )

WITH ( DROP_EXISTING = { ON | OFF } ) [;]

the suggested grammar would be

CREATE [ CLUSTERED | NONCLUSTERED ] INDEX index_name

ON [ database_name . [ schema ] . | schema . ] table_name ( { left(column, n) [ ASC | DESC ] } [ ,...n ] )

WITH ( DROP_EXISTING = { ON | OFF } ) [;]

SO to do this the suggested manner is to create a surrogate key containing the left(column, n)

Perhaps one day SQL Server will provide such functionality ...

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.