x

char vs varchar index performance

Can anyone tell me if there is any performance degradation in using a non fixed length datatype as a key field?

I've heard it thrown around by some database developers that we should be using char fields instead of varchar, which to me makes no sense at all. I would have thought the less bytes you store the better. But apparently index performance is much poorer on non fixed length fields i.e varchar.

Can someone please confirm or correct this?
more ▼

asked Sep 29, 2010 at 06:05 AM in Default

DavidD gravatar image

DavidD
79 5 5 7

(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

A simple practical experiment





CREATE  TABLE CharIndexTable
(NAME CHAR(10))

GO
CREATE INDEX IDX
ON CharIndexTable(NAME)

GO

CREATE  TABLE VarCharIndexTable
(NAME VARCHAR(10))

GO
CREATE INDEX IDX
ON VarCharIndexTable(NAME)

GO

SET NOCOUNT ON

INSERT INTO CharIndexTable
SELECT 'A'
GO 50000

SET NOCOUNT ON

INSERT INTO VarCharIndexTable
SELECT 'A'
GO 50000


No execute the following query and observe the page reads

 

SET STATISTICS IO ON

SET NOCOUNT ON

SELECT NAME FROM CharIndexTable SELECT NAME FROM VarCharIndexTable

Table 'VarCharIndexTable'. Scan count 1, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CharIndexTable'. Scan count 1, logical reads 124, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Query against my VarCharIndexTable table took a logical read of 91 and for the CharIndexTable it took 124

The reason behind this is that the index key on VarCharIndexTable is a varchar, is a variable-length data type, the storage size of the varchar value is the actual length of the data entered ie in our case its 1 byte.

In the case of CharIndexTable the index key is of CHAR, its a fixed-length character data type. In our case it will consume 10 byte key.

The index page of VarCharIndexTable can accommodate more number of rows than that of CharIndexTable Index. ie for complete Index scan operation index on VarCharIndexTable needs to process 91 pages only, but in the case of CharIndexTable it is 124 page.

more ▼

answered Sep 29, 2010 at 06:43 AM

Cyborg gravatar image

Cyborg
10.6k 36 40 45

Cool experiment. Data doesn't lie.
Sep 29, 2010 at 06:45 AM sqlnubi
my thoughts exactly.
Sep 29, 2010 at 10:00 PM DavidD

i went to test this (with much more data) and get some time stats. Spent an hour trying to work out why the "char'd" table was quicker than the "varchar'd" table... until i looked at the query optimizer a bit closer and realised that parallelism was kicking in for the char table but not the varchar table. Turned the threshold down and with both query plans being identical (either with or with out parallelism), the varchar table was quicker than the char table.

thanks for the feedback
Sep 30, 2010 at 06:14 AM DavidD
(comments are locked)
10|1200 characters needed characters left

Gail Shaw has an excellent series on indexes on this site. Please read the following article.

Gail Shaw's Article

If you are talking about your clustered index then yes you want it as small as possible.
more ▼

answered Sep 29, 2010 at 06:10 AM

Tim gravatar image

Tim
36.4k 38 41 139

ive read that article. like i wrote in the question - i thought less storage the better. I know this much already. I have a fairly solid understanding of how indexes work etc. Im trying to establish if there is any validity to the claim that char fields make for a better index column than a varchar column, in any way shape or form.
Sep 29, 2010 at 06:22 AM DavidD

@Cream Some of the conditions for a clustered key candidate Gail suggests are:

  • The values are never changing
  • The (new) values are ever increasing
Usually, it is not possible to bring a string data to comply with above. This means that any is generally speaking a horrible candidate for a clustered key, as it will lead to page splits more likely then not unless the fillfactor is carefully chosen and the index is reorganized/rebuilt frequently enough. If the evil deed really needs to be done then varchar is a better choice but only if the average length of varchar values is more than 2 bytes smaller than the size. For example, suppose you consider varchar(10) versus char(10), but you know that the average length of the actual data is 9. In this case choosing varchar(10) is just plain stupid as the varchar will consume more space (9 bytes per data + 2 bytes overhead = 11 bytes, 10% greater than simply using char(10) instead) though admittedly, it is not as stupid as ever having some column of varchar(1) datatype (this one tops the list). If the average length of the actual data is 7 then by all means varchar is a superior choice. So, as always in SQL-Server, it depends!
Sep 29, 2010 at 07:47 AM Oleg
(comments are locked)
10|1200 characters needed characters left

You can do some really rudimentary testing to see the difference between a varchar and char column:

My example uses a [Tally Table][1] to generate the test data.

CREATE TABLE TableA (Col1 char(10) NOT NULL,
                     PRIMARY KEY CLUSTERED (Col1));
CREATE TABLE TableB (Col1 varchar(10) NOT NULL,
                     PRIMARY KEY CLUSTERED (Col1));

INSERT INTO dbo.TableA
        (Col1)
SELECT RIGHT('0000000000'+CAST(t.n AS varchar(10)),10) FROM dbo.Tally T
;
INSERT INTO dbo.TableB
        (Col1)
SELECT RIGHT('0000000000'+CAST(t.n AS varchar(10)),10) FROM dbo.Tally T
;
EXEC sp_spaceused 'TableA'
;
EXEC sp_spaceused 'TableB'
;
DROP TABLE dbo.TableA
DROP TABLE dbo.TableB

Using my Tally table (11000 rows), I get the following results:

name      rows       reserved data    index_size unused
TableA  11000      264 KB    208 KB 16 KB      40 KB
TableB  11000      328 KB    256 KB 16 KB      56 KB

As you can see TableB (varchar) has 48KB more "data" than TableA (fixed length).

This can be "levelled-out" if you have SQL Server Enterprise Edition and use data compression:

CREATE TABLE TableA (Col1 char(10) NOT NULL,
                     PRIMARY KEY CLUSTERED (Col1) WITH (DATA_COMPRESSION = PAGE));
CREATE TABLE TableB (Col1 varchar(10) NOT NULL,
                     PRIMARY KEY CLUSTERED (Col1) WITH (DATA_COMPRESSION = PAGE));

This reduces the space used for both and effectively making the varchar column the same as a char:

name      rows       reserved data    index_size unused
TableA  11000      200 KB    128 KB 16 KB      56 KB
TableB  11000      200 KB    128 KB 16 KB      56 KB
[1]: http://www.sqlservercentral.com/articles/T-SQL/62867/
more ▼

answered Sep 29, 2010 at 06:51 AM

WilliamD gravatar image

WilliamD
25.9k 17 19 41

Damn, my post took me too long to format - Cyborg beat me to the punch.
Sep 29, 2010 at 06:52 AM WilliamD
+1 for the effort.
Sep 29, 2010 at 07:00 AM Tim
William i think you spend more time on formating, but looks cool!
Sep 29, 2010 at 07:06 AM Cyborg
appreciate the great feedback
Sep 29, 2010 at 10:01 PM DavidD
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x130
x37

asked: Sep 29, 2010 at 06:05 AM

Seen: 7250 times

Last Updated: Sep 29, 2010 at 09:32 PM