question

Anatoly avatar image
Anatoly asked

Optimize SELECT COUNT(*)

I have the table AKS\_ACTION (100M rows) that is permanently updating – about 50 rows are inserting and deleting in a second. My $1M question is: How to optimize the query SELECT COUNT(*) FROM AKS_ACTION? Grate thanks to Fatherjack... I thought the SQL Server is smart enough to use indexes for primitive query like SELECT COUNT(*) FROM THETABLE To understand what is going on, I did some experiments and got the following results /*================================================================================ "...COUNT(*)..." tests and ===================================================================================*/ SET NOCOUNT ON; DECLARE @ts datetime, @c int; SELECT @ts = GETDATE(); /* CASE 1: counting, no hints SELECT @c = COUNT(*) FROM AKS_ACTION; */ /* CASE 2: counting, use primary key index SELECT @c = COUNT(*) FROM AKS_ACTION WITH (NOLOCK, INDEX(1)); */ /* CASE 3: counting, non-clustered index SELECT @c = COUNT(*) FROM AKS_ACTION WITH (NOLOCK, INDEX(2)); */ /* CASE 4: treak counting */ SELECT @c = row_count FROM sys.dm_db_partition_stats WHERE object_id = OBJECT_ID('AKS_ACTION') AND index_id = 1 SELECT @c AS [Row Count], DATEDIFF(ms, @ts, GETDATE()) AS [Processing Time, ms] GO /* RESULTS: Row Count Time, ms Query ----------- -------- ----------- 72347024 1966 CASE 1 72347024 10620 CASE 2 72347024 2053 CASE 3 72347024 0 CASE 4 */ /*======================================================= "...COUNT(*) ... WHERE ..." tests The table has 1,120,000 rows The filter selects about 10% =======================================================*/ SET NOCOUNT ON; DECLARE @ts datetime, @c int; SELECT @ts = GETDATE(); /* CASE 1: counting, no hints SELECT @c = COUNT(*) FROM THETABLE WHERE (s_equipment_id LIKE '%001%'); */ /* CASE 2: counting, use primary key index SELECT @c = COUNT(*) FROM THETABLE WITH (READUNCOMMITTED, INDEX(1)) WHERE (s_equipment_id LIKE '%001%'); */ /* CASE 3: counting, non-clustered index; the index includes the filtered column(!) SELECT @c = COUNT(*) FROM THETABLE WITH (READUNCOMMITTED, INDEX(8)) WHERE s_equipment_id LIKE '%001%'; */ SELECT @c AS [Row Count], DATEDIFF(ms, @ts, GETDATE()) AS [Processing Time, ms] GO /* RESULTS: Row Count Time, ms Query ----------- -------- ----------- 113679 133 CASE 1 113679 176 CASE 2 113679 560 CASE 3 */
sql-server-2005optimizationvldb
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
I would like to add my 2 cents.
I thought the SQL Server is smart enough to use indexes for primitive query like select SELECT COUNT(\*) FROM THETABLE
**It is** smart enough to use indexes. In your case, it uses a clustered index scan. As Gail Shaw once aptly put it [this excellent article][1],
When an index scan is done on the clustered index, it’s a table scan in all but name
. If some records on the leaf pages of you index are locked then count(\*) has to wait for the lock to be released before it can continue, so the readuncommited hint is obviously useful. In the second part of test the predicate is not SARGable, and therefore, a hint to use non-clustered index is the single worst thing one can do to kill performance (the engine is forced to use the index + lookups for no reason). The rumor has it that in SQL Server count(\*) is just as good as count(1), but I always use count(1) because count(\*) never made sense to me. In Oracle one should never use count(*) but opt for count('X') instead, so I still remember the danger and use count(1) in SQL Server. [1]: http://www.sqlservercentral.com/articles/Indexing/68439/
2 Likes 2 ·
Oleg avatar image Oleg commented ·
@Anatoly Looks like @Fatherjack gave the best possible option to figure the record count. You can get counts for all tables even if you have hundreds tables with millions of records in each. The only word of caution is that if you have partitioned data then you will have one record per partition in the sys.dm_db_partition_stats (multiple records per each index) and also if you have multiple indexes and/or heap tables then you need to make sure that you are only using one first index. Heaps have index_id = 0 and clusters index_id = 1. There is no way to have both index_id 0 and index_id 1. Here is Jon's query adjusted for what is described above (same is discussed in Håkan's answer):
select 
    schema_name(o.[schema_id]) + '.' + o.name TableName,
    sum(ps.row_count) RecordCount
    from sys.dm_db_partition_stats ps inner join sys.objects o
        on o.[object_id] = ps.[object_id]
    -- just grab one index 0 (heap) or 1 (cluster)
    -- and query only user tables (type = U)
    where 
        ps.index_id < 2 and o.[type] = 'U'
        -- and o.name = 'your_table' -- table name
    group by o.name, o.[schema_id]
    order by 1;
2 Likes 2 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
What version of SQL Server?
0 Likes 0 ·
Anatoly avatar image Anatoly commented ·
SQL Server 2005, the table has the key **indx int PRIMARY KEY IDENTITY(1,1)**
0 Likes 0 ·
Anatoly avatar image Anatoly commented ·
SQL Server 2005, the table has the key indx int PRIMARY KEY IDENTITY(1,1) Sorry for repetition, I am novice here...
0 Likes 0 ·
Show more comments
Blackhawk-17 avatar image
Blackhawk-17 answered
My first thought would be to use a no lock hint. SELECT COUNT(*) FROM AKS_ACTION WITH (READUNCOMMITTED) From BOL: > READUNCOMMITTED Specifies that dirty reads are allowed. No shared locks are issued to prevent other transactions from modifying data read by the current transaction, and exclusive locks set by other transactions do not block the current transaction from reading the locked data. Allowing dirty reads can cause higher concurrency, but at the cost of reading data modifications that then are rolled back by other transactions. This may generate errors for your transaction, present users with data that was never committed, or cause users to see records twice (or not at all). > > READUNCOMMITTED and NOLOCK hints apply > only to data locks. All queries, > including those with READUNCOMMITTED > and NOLOCK hints, acquire Sch-S > (schema stability) locks during > compilation and execution. Because of > this, queries are blocked when a > concurrent transaction holds a Sch-M > (schema modification) lock on the > table. For example, a data definition > language (DDL) operation acquires a > Sch-M lock before it modifies the > schema information of the table. Any > concurrent queries, including those > running with READUNCOMMITTED or NOLOCK > hints, are blocked when attempting to > acquire a Sch-S lock. Conversely, a > query holding a Sch-S lock blocks a > concurrent transaction that attempts > to acquire a Sch-M lock. For more > information about lock behavior, see > Lock Compatibility (Database Engine).
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image
Fatherjack answered
If you know the index name then go for that, otherwise go for the table name. You will get the table row count from the clustered index on the table with: SELECT OBJECT_NAME(ps.object_id) , i.name , row_count FROM sys.dm_db_partition_stats AS ps INNER JOIN sys.indexes AS i ON ps.index_id = i.index_id AND ps.object_id = i.object_id WHERE i.name = '' or change the WHERE clause to be : WHERE OBJECT_NAME(ps.object_id) = '' if you want it for a specific table. Added advantage: this is stored in the dmv so you dont actually touch the table to get this detail so the time to complete wont differ with the size of table :) You may need to SUM the row count if the indexes spand more than one partition.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
+1 This is the fastest solution by far, see more discussion about rowcount in this question.: http://ask.sqlservercentral.com/questions/1980/how-do-i-get-a-list-of-sql-server-tables-and-their-row-counts
1 Like 1 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
thanks for remembering the question Håkan, I knew there had been one but couldnt recall where!
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.