question

Leo avatar image
Leo asked

Database Performance

I have realized something this morning. One of my developer who wrote a stored procedure to delete the records from a Table which contains Millions of records. He was trying to delete just the thousand of records and doing the DELETE procedure to run every morning. the script is simple script - DELETE FROM TABLE1 WHERE PERIOD = '201008' It took at least 10 mins to remove the records because of the size of the table. I am not going to blame him because I might do the same if I want to delete the records from the table. But my question is - 1. is there anyway to use different type of DELETE to remove the records quicker? 2. We are using Standard Version of SQL 2008 so that we can't partition the table. What is the best way to re-structure the table which contains Millions of records. (**Shall I create Tables by Period??) Thanks. 1.1.
sql-server-2008performance
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

WilliamD avatar image
WilliamD answered
How selective is the date 201008? You can work that out by doing: SELECT SUM(CASE WHEN PERIOD = '201008' THEN 1 ELSE 0 END)/COUNT(*) FROM TABLE_A As @Grant Fritchey says, you need to look at the execution plan to see if this index is being used. If 201008 is not selective enough, then it may be performing a table/clustered index scan to find the rows to delete. You may then experience a slow down. Does the table have multiple indexes on it that also need updating when the delete happen? **EDIT: After looking at the execution plan I have added the following:** @Leo - So, as far as I can decipher, you are getting index seeks on your nonclustered index and then table lookups against the table itself. This will be slow. I don't know what queries are run against this table, but you may want to consider changing that index to a clustered index. I am going to venture a guess that you may have other indexing issues on your system. The index here has been suggested and created by the Index Tuning Advisor. While not a bad thing in itself, it would lead me to believe there is some knowledge on indexing principals missing here. I recommend that you read up a little on the topic of indexing and SQL Server internals- [Kalen Delaney's SQL Server Internals][1] goes into depth on all aspects of SQL Server, including indexing and is ***excellent***. [1]: http://www.sqlserverinternals.com/
8 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Cyborg avatar image Cyborg commented ·
+1 For multiple index check.
0 Likes 0 ·
Leo avatar image Leo commented ·
Yes, it has. There is only one index for (Period,CustID) in the table. Shall I split them? Period as Cluster Index and CustID as Index??
0 Likes 0 ·
Cyborg avatar image Cyborg commented ·
Congrats William for crossing 5K
0 Likes 0 ·
WilliamD avatar image WilliamD commented ·
I wouldn't say split it just like that. Period and CustId is going to be unique, Period alone probably not. You should be getting a clustered index seek, or if the period is not selective enough, a clustered index scan.
0 Likes 0 ·
Leo avatar image Leo commented ·
0 Likes 0 ·
Leo avatar image Leo commented ·
will it be difference Clustered Index on (Period, CustID) or Clustered Index on Period and Normal Index on CustID?
0 Likes 0 ·
WilliamD avatar image WilliamD commented ·
@Leo - that is the right book. Buy it now! :) The clustered index on Period,CustId (if these two together make a unique value, state that for the table - even make it a primary key) means that you have the data stored sorted by those values. This would assist you in this delete operation, in that you could identify a contiguous block of rows that would need deleting. It may still not be the best index for you. However, it could be better than leaving the table as a heap. As always with anything you hear on here - TEST, TEST, TEST. Something we say will work, is based on our assumptions, only you know the real system that you are working on. Suggestions here are based on the informatino you provide (you may unknowingly miss out some vital piece of information). Either way, good luck and good reading. The book can be heavy going if you are not so good at internals, but take your time, read it a few times and maybe just one chapter at a time. There is a lot to digest in there.
0 Likes 0 ·
Leo avatar image Leo commented ·
Thanks Will, Just Bought it. :-)
0 Likes 0 ·
Håkan Winther avatar image
Håkan Winther answered
Make sure you have an index on the column that you use in the where clause and that the conditions is of the same datatype so you don't get any implicit conversions. 10 000 records in a delete is not that much. If you want to delete millions you should consider deleting in batches to prevent any deadlocks etc. Maybe you are experiencing an IO bottleneck. Take a look at the IO stalls with the following statement: SELECT * FROM sys.dm_io_virtual_file_stats(NULL,NULL); The IO read/write stalls / ms should be as low as possible. More than 15 ms is considered high.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blackhawk-17 avatar image Blackhawk-17 commented ·
You deserve an extra +1 for the I/O route... take one of mine :)
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
Thank you Blackhawk :)
0 Likes 0 ·
Cyborg avatar image
Cyborg answered
Check the following 1. does your query can utilize the index. 2. check your fragmentation of indexes and updated statistics. 3. make sure that you don't have any waiting or blocked processes. Since you cant use partitioning. The best approach will be designing appropriate indexes.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered
You would need to look at the execution plan to determine what was going on. It could be as simple as @Hakan says, add an index. It might be more complicated if there are dependencies on that table that will require consistency checks against other tables. Do they all have indexes that the query can use as well? Get a look at the execution plan and it ought to give you some direction. The other question is, what percentage of the millions of rows are you deleting that is taking 10 minutes. Less than 1%, that's a problem. More than half, that would be expected. Sometimes, depending on the data volume involved, there's little that can be done. BTW, 5-10 million rows isn't that big a table. If it's well indexed, you ought to be able to get good performance out of it.
9 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
@Leo, actually, it's not. It's XML. If you right click on it you can use "Save As" to make it a .sqlplan file. If you edit your post, I think you can link the file to it (although I could be wrong about that).
1 Like 1 ·
Håkan Winther avatar image Håkan Winther commented ·
+1 I totally agree with you Grant, it shouldn't be a performance issue as long as there aren't issues with indexes or locks.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Yeah, that's way too straight forward a query to have issues in other places. Our @Cyborg was right to point out blocking as a possible cause too.
0 Likes 0 ·
Leo avatar image Leo commented ·
+1 for all of you. There is an index on PERIOD. Also I have done the Index Rebuild every night.
Number of Records in Table - 226,853,492

Table: 'TABLE_A' (1141579105); index ID: 1, database ID: 5
TABLE level scan performed.
- Pages Scanned................................: 4535046
- Extents Scanned..............................: 567425
- Extent Switches..............................: 675994
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 83.86%
[566881:675995]
** - Logical Scan Fragmentation ..................: 2.82% **
** - Extent Scan Fragmentation ...................: 1.20% **
- Avg. Bytes Free per Page.....................: 96.2
- Avg. Page Density (full).....................: 98.81%
0 Likes 0 ·
Cyborg avatar image Cyborg commented ·
leo what is the data type of Period column?
0 Likes 0 ·
Show more comments
Blackhawk-17 avatar image
Blackhawk-17 answered
Tackling this from the other direction... how long does a SELECT of a given date take? If it's the same we can head down one avenue, if it's a magnitude faster we can go down another. A long SELECT points to a missing index, a quick one tends to suggest blocking during the DELETE. These are broad generalizations but will help us understand the issue better. And, as Grant et al have said - what is the execution plan?
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Leo avatar image
Leo answered
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered
Do you have a clustered index on the table? From the execution plan it looks like you are not having a clustered index. > table="[TABLE_A]" indexkind="Heap"/
6 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
you shouldn't have a table with hundred of millions of records in a heap!! That will give you bad performance overall
2 Likes 2 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
An index yes... not a clustered index though. Unlike Oracle SQL Server really likes to have clustered indexes.
1 Like 1 ·
Leo avatar image Leo commented ·
No, It has only one Multiple Index (Period, CustID) is normal Index
0 Likes 0 ·
Leo avatar image Leo commented ·
That is not a Heap I think, there is index on the table.
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
If you don't have a clustered index, it's a heap, no matter how many non clustered indexes you have. The data is stored in a heap or in the lowest level of the clustered index.
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.