x

Count(*) Vs Count(Clustered Index Key Column)

Is there any differnce between the Query performance among "Select count(*) from tblname where field1 IS NULL" and "Select Count(clusteredIndexKey Column) from tblname where field1 IS NULL"?
more ▼

asked Jan 11, 2011 at 10:43 PM in Default

Newbie Bala gravatar image

Newbie Bala
293 11 11 13

(comments are locked)
10|1200 characters needed characters left

5 answers: sort oldest

Bala,

I don't see any performance difference unless your query missing any useful indexes, because Most of the time Query analyzer will choose the best indexes available and query plan to execute the query. I believe both query will end up on 'scan' on a Same index.

Prefer the Index Key Column, that is the standard.
more ▼

answered Jan 11, 2011 at 10:49 PM

Cyborg gravatar image

Cyborg
10.6k 36 40 45

(comments are locked)
10|1200 characters needed characters left

The only real way of knowing is to test and take a look at the resulting execution plan and resource statistics.

In general, doing the COUNT(*) will let the optimiser decide which index is the best one to use to satisfy the query (you are basically saying it can use any of the columns in the table to run the count. I would lean toward using this method.

By using COUNT(columnname) you force the optimiser to count that column - this makes the optimiser look for the best access method of counting that column along with the WHERE clause.

In most cases, you don't want to tell the optimiser how to fulfill your request, you just tell it what you want to know. This is sometimes difficult to grasp, as programmers normally think in the other direction (tell the computer what to do). The query optimiser is a very clever and very powerful tool, provided with good information (indexes, statistics) it will find the best way of answering your questions.

The execution plan will depend on your table structure, available indexes and statistics and data spread (you don't say how many rows of the table match field IS NULL) along with other settings like multiple CPU usage.

Sorry for the vague answer, but there are too many unknowns to give a quantitative answer.
more ▼

answered Jan 11, 2011 at 11:49 PM

WilliamD gravatar image

WilliamD
25.9k 17 19 41

For sanity check, count(column_name) is out of consideration for me, I will never risk using it even if it costs me a job, but I might use count(*) if I have to. There was an interesting discovery by many Oracle users (older versions) about count('X') being considerably faster than count(1) or count(anything_else_*_included). I was told though that in versions 10g+ there is no difference.

I was criticized once by Hugo Kornelis who claims that count(*) and count(1) are the same as far as performance is concerned, and I unconditionally believe him because he is Hugo Kornelis. Of course count(column_name) is simply too risky to ever consider and therefore, I usually opt to use count(1), but if I happen to participate in the SSC forum which Hugo might also attend, I will use count(*) out of respect :)
Jan 12, 2011 at 09:44 AM Oleg
Only to mention. If you will use COUNT(*) and will not have read rights on all columns, then the query will fail as the query optimizer expands the column names in early stages of query processing. The same is if you use eg. COUNT(1) it's translated to COUNT(*) by query optimizer too.
Jan 12, 2011 at 10:42 PM Pavel Pawlowski

@Pavel Pawlowski This is a very good point! I suppose it means that the summary of the answer should go like this: there is no performance difference (or at leat there is no noticeable performance difference) in count(*) vs count(clustered_index_column). Which one to use? It depends. If there is a possibility that the user does not have rights on all columns in the table then count(1) and count(*) are not possible. If the clustered index column is nullable then count(clustered_index_column) might produce incorrect result.

Generally, if the user is DBA or at least has rights on all columns then count(*) or count(1) seems to be a good choice.

By the way, let me copy your comment and paste it here, because you forgot to escape the star in yours, so the count(*) you intended became count() making the further text' font in italics :)

Only to mention. If you will use COUNT(*) and will not have read rights on all columns, then the query will fail as the query optimizer expands the column names in early stages of query processing. The same is if you use eg. COUNT(1) it's translated to COUNT(*) by query optimizer too.
Jan 13, 2011 at 08:03 AM Oleg
Brilliant @Pavel and @Oleg - i never thought about the column permissions part. I thought it would be fine in that only the columns that a user can see would be considered in the COUNT(*).
Jan 13, 2011 at 11:58 AM WilliamD
I totally forgot about this side effect, it is great that Pavel reminded us about it. I don't know why I did not recall it when answering the question. This is what I used few months ago when Hugo Kornelis set me straight about the lack of difference between count(1) and count(*). Yes, I knew that Hugo was right of course, but I needed the way to convince myself just for sanity check. So I created a user, denied him of the rights to one column in the table, issued select count(*) to make sure the engine barks at me as expected and then issued select count(1) to hear that it barks just as loudly. This made all my doubts disappear for good. I still use count(1) just because it is a habit from which I cannot break, and it is still so much better then count(some_column_name) most times :)
Jan 13, 2011 at 12:54 PM Oleg
(comments are locked)
10|1200 characters needed characters left

Yes there is a difference.

If you take this example:

USE [adventureworks]
GO

SELECT
    COUNT(*)
FROM
    [Sales].[Customer] AS c
go
SELECT
    COUNT([CustomerID])
FROM
    [Sales].[Customer]

If you look at the estimated or actual query plan you see exactly the same thing - Index Scan 85%, Stream Aggregate 15%, Compute Scalar 0%, Select 0% So you might think they are the same but if you execute SET STATISTICS IO ON, re-run the queries and view the message details you will see that the Count(*) has something like

Table 'Customer'. Scan count 1, logical reads 36, physical reads 6, read-ahead reads 34, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

whereas the Count(CustomerID) has

Table 'Customer'. Scan count 1, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Not a huge difference but the index being used saves accessing the table rows and is therefore faster and has less impact on the server in terms of disk access and potentially holding up other processes.
more ▼

answered Jan 12, 2011 at 12:55 AM

Fatherjack gravatar image

Fatherjack ♦♦
42.3k 75 78 108

@Fatherjack - the numbers for your example may be a little misleading (especially the comparison of logical and physical reads). The first query will have loaded the data into the cache, while the second is using the cached data - so the second one will win. (Cleanbuffers or run both with warm cache should negate the differences).

I don't have an adventureworks on my machine at the moment - are the query plans identical? I guess they would be because CustomerId will be the clustered PK on that table (if adworks is built properly!).

I would suggest doing the test with a WHERE clause that uses a non-key column, or is part of a covering index. Another option would be to count on a non-indexed column, just to get the differing plans through forcing different index access.
Jan 12, 2011 at 01:15 AM WilliamD
Gah, you could be right. I cleared the cache/buffers before and after the execution but not in the middle of the quoted script. Dont have time to re-run it here now, will try it later. Idiot error - Is it Monday?
Jan 12, 2011 at 01:38 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left

While there is no difference in performance, using the count(clustered_index_column) instead of count(*) or count(1) may easily lead to unwanted results and therefore, should be avoided at all costs if at all possible. This is simply because typically, the queries like select count(clustered_index_column) are intended to figure the number of records in the table, but unfortunately, there is no way to guarantee that such query will return an accurate row count, so the purpose of the query is defeated. Thus, in my humble opinion, such queries should be avoided and replaced with select count(*). The reason for a possibility of inaccurate count is that the record with NULL in the column will be excluded from the count, and create unique clustered index statement does not require the column to be not nullable. Here is the small example to demonstrate the point I am trying to make:

use AdventureWorks;
go

-- create table with nullable column   
create table dbo.SampleCount
(
    col1 int null, col2 varchar(20) null
);
go

-- create UNIQUE CLUSTERED index on that NULLABLE column    
create unique clustered index IX_SampleCount on dbo.SampleCount (col1);
go

-- insert 3 records including the one with NULL in col1  
    insert into dbo.SampleCount
    select null, 'record_null'
    union all select 1, 'record_one'
    union all select 2, 'record_two';
    go

-- use this select to see why select count(col_name) should be avoided 
select 
    count(col1) wrong,
    count(1) correct,
    count(*) correct_as_well 
    from dbo.SampleCount;

drop table dbo.SampleCount;
go

Executing the sample query above produces the following results,

wrong       correct     correct_as_well
----------- ----------- ---------------
2           3           3

clearly revealing that the attempt to use count(column_name) to retrieve the accurate number of records in the table is futile despite the fact that the column is the only column of the unique clustered index. I understand that there might be an argument that usually, the column of such index is not nullable, but it does not matter. Since the possibility of getting inaccurate count exists, it is perfectly fine to use reliable count(*) or count(1) to get the desired result.

Just my 2 cents.

Oleg

more ▼

answered Jan 12, 2011 at 12:25 PM

Oleg gravatar image

Oleg
15.9k 2 4 24

Awesome explanation there Oleg. +1
Jan 12, 2011 at 01:34 PM WilliamD
so depend on PK column?
Jan 12, 2011 at 11:19 PM Cyborg
I would still go for COUNT(*) - the PK may not be the best index to use to satisfy a count. Let the query optimiser decide what index to use.
Jan 12, 2011 at 11:51 PM WilliamD
yes william i agree with you because the if the PK index is a clustered one then needs more logical reads than any other. I am i correct?
Jan 13, 2011 at 12:00 AM Cyborg

@Cyborg - it depends! :)

I just go for COUNT(*) because there is no ambiguity in doing that.
Jan 13, 2011 at 04:56 AM WilliamD
(comments are locked)
10|1200 characters needed characters left

So apparently Count Function will look for the Number of Row Identifiers?

( Irrespective of the indexes or their types i assume)

So even if we have a ROW with all Columns having NULL Values, it would be counted as an entity using Either Count(*) and Count(1).

When it Comes to Count(Column_NAME) it looks for the NUMBER OF DATA Offsets defined all along the table Extents.

Some one please correct me if i am wrong?
more ▼

answered Mar 12, 2013 at 04:30 PM

Suresh.Patibandla gravatar image

Suresh.Patibandla
120 3 3 4

Your question is somewhat different than the original (which is over two years old). Can you ask is as a new question?
Mar 12, 2013 at 04:40 PM KenJ
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x985
x51

asked: Jan 11, 2011 at 10:43 PM

Seen: 3112 times

Last Updated: Mar 12, 2013 at 06:25 PM