question

Pamz avatar image
Pamz asked

Best approach to delete first 5 million rows from a heap in batches

The subject is pretty self explanatory. But what we want is to keep the size of the log file minimal with optimum performance. I am not sure whether it is possible as there is no clustered index and hence no specific order of the data?
sql-serverdeletebatchheap
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image
Fatherjack answered
what version of SQL Server you are working with may have an effect here. You can use `TOP n` or `SET ROWCOUNT` to control the number of rows to delete on each pass. This question has a relevant answer - http://ask.sqlservercentral.com/questions/8210/advise-on-deleting-huge-amount-of-rows.html and here http://ask.sqlservercentral.com/questions/28252/deleting-10-million-records-from-a-database-table.html and here http://ask.sqlservercentral.com/questions/44027/deleting-millions-of-rows.html
6 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Pavel Pawlowski avatar image Pavel Pawlowski commented ·
You cannot guarantee the order of insertion during select unless you have an IDENTITY (or some time stamp column which has different timestamp for each row) column in the table. If you hwave identity column, you can order the rows by that column and then delete the first 5 milion rows. If you do not have the INDENTIY, than there is no way to gurantee the insertion order.
6 Likes 6 ·
Pavel Pawlowski avatar image Pavel Pawlowski commented ·
I only add, that for heap it's hard to defiine what is first 5 millions as you have no order guaranteed in the heap. You will have to provide some correct criteria, which will identify those 5 millions rows. Alhough it is possible to use some undocumented functions to identify the physical order of rows in the table, but right way is to define right conditions to limit the rows to be deleted and then delete those rows in batches.
2 Likes 2 ·
Pavel Pawlowski avatar image Pavel Pawlowski commented ·
Those undocumented functions can help you get values in order how they are physically stored on pages, however again this does not guarantee the order of insertion.
2 Likes 2 ·
Pamz avatar image Pamz commented ·
Thanks for the reply. My version is sql 2008. But let me explain it a bit further; I want to delete first top 5 million in order of insertion. I already had it covered for a clustered indexed table and for an archival based upon a date column. Not sure if this FIFO kind of work is possible in heap especially when SQL server can delete the data in an order which only SQL server knows (IMHO). With these details, can I take your reply as a solution?
0 Likes 0 ·
Pamz avatar image Pamz commented ·
That exactly what I was thinking. But you hinted some undocumented functions. can they help?
0 Likes 0 ·
Show more comments
Pamz avatar image
Pamz answered
I have just got the good news that source text files for the heap are available at remote site. Those would be approximately 100K size each. So here it is what we are planning to do A.Create a clustered index on heap for all the eight columns as they make a primary key B.Import the data from one file in a Staging table C.Create a Unique Clustered Index on all the eight columns of the Staging table D.Do the batch delete from the heap against the staging table The steps from B to D will repeat for every file. We are also planning to re-import all the data, if all the files are found. But this time with an identity column and a column which would store the file name. I will appreciate any valuable suggestions to improve the process.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.