question

sandy.boon avatar image
sandy.boon asked

the same queries , which were running fine earlier are running very slow now

Hi All , So here is the probelm - in our production enviornment, we have a job which calls stored procedures in sequence ,there are two types of stored procedures - 1. Sync procs (which pull the data from one DB to another within the same server) 2.Transform procs (these apply some business logic , updates statements and CTEs are frequent in these procs) we created three new transform procs and added to job , and after production deployement all of the queries (including the ones that were not changed ) ran painfully slow and the same job which used to take 9 hours to complete , took 19 hours this time. The data (numbers of rows) that this job handled is more or less same as it did before deployement. We are boggled and unable to figure out what could have caused the slow execution of unchanged queries. We have ruled out 1.network issue (because there is no case of inter server execution) 2.Data issue (more or less same amount of data was handled) Please tell me , how should i proceed to troubleshoot this issue ? how can i get the server details (like memory available , tempdb stats) at previous point of time ,because job has completed now and management is chasing us to find out the root cause!
performanceservertroubleshooting
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
By statistics, I mean performance metrics, cPU, memory, etc. If you don't set up a method for collecting that information, nothing collects it for you automatically.
2 Likes 2 ·
Usman Butt avatar image Usman Butt commented ·
@sandy.boon 9 hours job. It is quite scary itself ;) If it is mainly UPDATE statements, have you ever thought of testing a FULL DELETE and INSERT? Depending upon the amount of data is to be changed, a better way could be to do it with a FULL DELETE AND INSERT. And if only some recent data is to be changed then another way could be to seperate the recent data and apply business logic and syncing only to that data. This way your INDEX and STATISTICS maintenance would have less over head. Since I do not know whats happening in the JOB, I can only guess what could be handy. Some other things to look into would be are you using SSIS on the same server? Any CLR stored procedure is used which can cause memory leak? Any thing the event/error log spits out like flushing of memory etc.? Last but not least, Brent Ozar's excellent sp_Blitz may help you a great deal.
2 Likes 2 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
As to how to monitor and tune queries, I wrote a book on it. Two versions actually, one for 2008 and one for 2012. You can get it from Amazon.
1 Like 1 ·
sandy.boon avatar image sandy.boon commented ·
@tanglesoft yes i did try using querystats but as you stated the stats are gone , they are not from the day that job was run :( We are currently running the same job in resr enviornment but it is performing better there , albeit on lesser amount of data. we will try tp restore production data tomorrow and then monitor the job ... please suggest if somethingelse strikes your mind @Grant the statistics are gone !? its going to be harder to troubleshoot .. i will look into sys.dm_os_wait_stats and may be i will find something there .. we have an Index and stats maintainence job which runs on Sunday .. now this JOB (the one i posted about) runs on thursday .. as per you suggestion i will also look into the way these Jobs are written (these jobs are maintained by operations team :( ) i will come back with findings may be you guys can help me out with detailed info .. the thing is we already know the queries which perform poorely but this time around all the queries took more time then they should have .. could you point me to some blog or something detailing step by step approach to go about such issues .. thanks for you answers guys .. really appreciate it !
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered
If you are not already monitoring the server, you can't get performance metrics for the past. They're gone. Instead, you can focus on what the problem is now. Take a look at sys.dm_os_wait_stats to determine what is causing things to run slow on your system. Maybe there's been a configuration change to memory, CPU or something. The wait statistics will tell you what is causing things to run slow. As to the queries, how are your statistics maintained? Do you automatic update of the statistics enabled? If not, you should have a manual process. You probably should have a manual process in addition to the automatic one. Try updating the statistics with a FULL SCAN instead of using the sampled method that is applied by default. Other than that, you have to look at the queries, understand what is running slow and why, then address it. No easy way around it.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

tanglesoft avatar image
tanglesoft answered
SELECT t.TEXT QueryName, last_elapsed_time, * FROM sys.dm_exec_query_stats s CROSS APPLY sys.dm_exec_sql_text( s.sql_handle ) t This will give you last execution time however the table does not retain a very long history. There are other DM tables that retain various metrics about sessions, plans and the like. Do you have a test environment that you can restore the a old backup prior to the change? If you do you could run the previous and current versions of the script prior to any data change to see if there is a difference. If they both perform similar in the test environment it's possible that the statistics are stale in production. If there is a difference in test then I suggest you have a look at the execution plan and other execution metrics to see whats changed.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

tanglesoft avatar image tanglesoft commented ·
I would try and isolate if this is data, code or environment that had changed. Restore to another environment then run old code followed by restore then new code. If you get a difference in performance similar to production then it's code that has caused the issue. Does the job have multiple steps in which case you will have the job history to identify which step suddenly takes longer. Then it's down to determining what's going wrong in the code. In addition run a database comparison before and after the change using Visual Studio or SQL Compare to see if anything changed other than what you expected. It's possible an index has gone missing. If you don't get a difference in test environment or you don't have the time to run both variants of the code you will struggle to spot what when wrong when it did run if this more than a day back. You could run a disk usage by table report between new and old database and look for any sudden changes in record counts. A significant change from say 10 to 1000 could the optimiser make a bad decision of statistics are out of date.
0 Likes 0 ·
Tim avatar image
Tim answered
I am going to go out on a limb here and say that 1) SQL could just be picking a bad plan after the new sprocs were added. When you introduced the new sprocs did you also create any new supporting indexes? If so did the old sprocs start using the new index that may not be as optimal? Also could the issue be that you finally hit a tipping point with fragmentation on your tables that is causing the overall issue? How often are you rebuilding or reorganizing your indexes? As Grant mentioned with statistics, you should also be doing maintenance on your indexes. High fragmentation can impact queries by hundreds of percentage. Try to isolate each sproc and review its execution plan to see where the bottleneck is.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
I'm leaning more heavily towards stats now since I know that the stats maintenance is run on Sunday and this load is run on Thursday.
1 Like 1 ·
Tim avatar image Tim commented ·
A good friend of mine had this same issue a few weeks back, job normally ran in 3 hours, it started taking 17 hours. After days of research, restarting the box, etc a simple sp_updatestats solved their issue. Updating stats is such a low cost operation that everyone should have scheduled often. A nightly job is part of my standard server build. In batch ETL operations I have been known to make it a final step in the job.
1 Like 1 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.