question

Bartoloz avatar image
Bartoloz asked

Loop fetching massively degrades performance

Hi, I have to move billions of rows from one table to the other on live system so we perform it in batch processing. When I run the query which fetches data it uses proper clustered index and last miliseconds however when the same query is used in the loop, in that batch processing it seems to it totally mistook the proper query plan and performs massive scans. Cannot find the reason why it is happeing like that? Any insights appriciated.
tsqlperformancedata-processing
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bartoloz avatar image
Bartoloz answered
Grant, here it is: while 1=1 begin insert into #batch -- the problematic query select top 50000 tab_id from db..tableA with (nolock) where tab_id > @min_id and tab_id <= 1768364873 order by tab_id option(recompile); if @v_rowcount = 0 break; ------------------------------ -- ... inserting to different table ------------------------------ select @min_id = max(id) from #batch; truncate table #batch; --print message print 'Processed....cnt, rows, time'; end
1 comment
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Looking at the query, nothing is obvious about why it might be different from one loop to another. But, the issue is still likely data and statistics related. As that value @min_id changes, you're likely to see varying amounts of data retrieved, further, varying ranges of data from within the index on TableA are going to match the criteria. Logically though, that query looks somewhat problematic. If you put in a value of 1, it's going to 50000 values from 1 to 1768364873. There must be more logic elsewhere that presumably sets the value to 500001 for the next run? Otherwise you're looking at odd choices. I'd still suggest looking the tally table approach. If you can look for specific ranges rather than relying on a TOP statement, you might be better off with more consistent performance and a better execution plan. Check the SELECT operator on the plans. See how it's determining the runtime and compile time values for the variable. That could also give you information on what's going on.
1 Like 1 ·
Grant Fritchey avatar image
Grant Fritchey answered
Without seeing the query, it's impossible to say. However, a loop, a cursor, is a very poor mechanism for data movement. You're very likely to hit performance issues. You'll be better off finding a batch-based process that uses set-based logic, not cursors.[Here's a good article][1] on the use of the tally table as a mechanism to avoid loops. [1]: http://www.sqlservercentral.com/articles/T-SQL/62867/
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bartoloz avatar image
Bartoloz answered
Maybe I haven't exressed properly enoough, batch processing = batch-based process. So the query is really simple, it's select which fetches clustred index and then it's inserted into other table. What's really surprise me, is that this select executed without the loop properly uses the index and causes really few reads, when it's executed in batch-based way (the IDs of course are changing in where-condition to fetch next batches) it lasts really long and causes massive reads, like it would perform index scans not index seeks.
8 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I found the reason of degradation, minimum IDs parameter used in loop batch-based processing causes query degradation, SQL Server doesn't use proper plan with this parameter.
0 Likes 0 ·
Yes, it's a local variable. I added recompile option at the end of the query, indeed it speeded up ca. 30%, thank you! However, it's still differs when the query it's executed not in the loop.
0 Likes 0 ·
Good to hear. Are there other differences apart from the use of the local variable? Pretty much everything that's different, everything, could lead to changes in the choices made by the optimizer.
0 Likes 0 ·
Grant, I already upvoted your comment about parameter sniffing. I suggest you convert that to an answer so Bartoloz can mark it correct.
0 Likes 0 ·
Bartoloz, as Grant said, it is using average statistics for all executions in the loop because of the parameter(s). It might run faster for specific hard-coded values because then it would generate a plan just for those values.
0 Likes 0 ·
Show more comments
Grant Fritchey avatar image
Grant Fritchey answered
Is it a parameter or a local variable? If it's a local variable, then the optimizer uses an average of the statistics to generate a plan rather than a specific value to get a more accurate plan. Assuming the query is simple, you might try using a RECOMPILE hint. That will sniff the parameter and possibly improve the plan choices.
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bartoloz avatar image
Bartoloz answered
Grant, your suggestion was right. I removed condition which was there just for limit purpose and put it at the begning of the loop so curently the codes looks like that: while 1=1 begin if @min_id > 1768364873 break; insert into #batch -- the problematic query select top 50000 tab_id from db..tableA with (nolock) where tab_id > @min_id order by tab_id option(recompile); .... and now is running as a champ! Thank you very much for time spent and your help!
1 comment
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

That's terrific, Bartoloz. Please mark Grant's answer as correct so others know this has been resolved.
2 Likes 2 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.