question

aRookieBIdev avatar image
aRookieBIdev asked

merge perfomance improvement

Hi All, I am using a merge query to do an update(Please find the query below, There are around 30 columns in each table but i have just provided three here) There are about 30 million records in the source table and 300,000 records in the destination table. There is a index in the joining column in both table. However this query takes forever to complete.Please provide suggestions. MERGE DestTable AS Dest USING ( SELECT Col1 , Col2 , Col3 , CHECKSUM(Col1, Col2, Col3) AS [CheckSUM] FROM SourceTable ) AS SRC ON SRC.Col1 = Dest.Col1 WHEN MATCHED and SRC.[CheckSum] Dest.[CheckSum] THEN UPDATE SET Dest.Col2 = SRC.Col2 , Dest.Col3 = SRC.Col3 , Dest.CheckSum = SRC.Checksum;
sql-server-2008sqlperformancemerge-join
11 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

JohnM avatar image JohnM commented ·
Any chance that you can provide the execution plan? Also, do you have to use a MERGE statement? I don't see any other action other than update so why can't you use a simple UPDATE statement with a join?
1 Like 1 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
hard to say for sure without the execution plan. However, the operation is less likely to get good index use and could lead to scans. It would be better if you could define absolute positives or negatives so that the indexes can be used appropriately.
1 Like 1 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Thanks. But the best thing would be to post the .sqlplan file. You can get that by right clicking on the plan and choosing Save as.
1 Like 1 ·
aRookieBIdev avatar image aRookieBIdev commented ·
will using a inner join help? i thought the merge does a join in the background?
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Again, no execution plan, I'm just guessing. Sorry.
0 Likes 0 ·
aRookieBIdev avatar image aRookieBIdev commented ·
![alt text][1] Please read the image from below.I couldnt fit all the content in a single screenshot. [1]: /storage/temp/954-executionplan.jpg
0 Likes 0 ·
executionplan.jpg (87.1 KiB)
Show more comments
aRookieBIdev avatar image aRookieBIdev commented ·
Hi , I Have uploaded the screenshot of the execution plan below. Thanks again.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Since I can't see the properties of the plan, I can't tell you why, but, it looks like you're scanning the entire data set. Further, it looks like that's the only thing that can be done because it doesn't look like the WHERE clause is filtering very much data at all.
0 Likes 0 ·
aRookieBIdev avatar image aRookieBIdev commented ·
thanks , i have got the sqlplan file now.But couldnt upload it here.Can i share it to your mail?
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
You should be able to post it here, even if you zip it first. We've had them before.
0 Likes 0 ·
ThomasRushton avatar image
ThomasRushton answered
As @Grant points out, there are several functions that are affecting this query. These are: * [dbo].Humana_OrgHub_ValidateDUNS * [dbo].Humana_OrgHub_ValidatePhoneNum * CHECKSUM * SUBSTRING * REPLACE I'm guessing that the ValidateDUNS function is a lookup; the ValidatePhoneNum is probably a stringslicer. What might be worth doing is something along these lines: 1. Create a temp table that contains the fields in the base table from the SRC part of the MERGE statement 2. Populating that table with records that match the DEST table, but without doing the function stuff above 3. using the temp table as the SRC of your MERGE, which would remain largely unchanged. The problem with optimising queries that use inline functions is that the query / exectuion plan doesn't show the impact of those individual functions.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

aRookieBIdev avatar image aRookieBIdev commented ·
thanks , I have used a temp table .. stored the records that will be updated there , along with the calculated fields.The temp table also had the indexes on columns on which the join shall be made. Using this temp table i did an update on the final table. Guess what... from 7 hours to 2 minutes now !!!! Thanks a lot every one for helping me out.
1 Like 1 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
I've just looked at this in SQL Sentry Plan Explorer. Boy, was I wrong... string slicing all the way.
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered
The estimated number of rows is 50546400000. Either you're updating all the data in the table, or, your statistics are WAY out of date. Either way, it's going to be difficult to tune a query that is accessing that much information. The table [OrgHub].[dbo].[BusinessRecords] is a heap, meaning it doesn't have a clustered index, and there's no indication of any filtering on that table at all. You're joining to [OrgHub].[stg].[DnBOrgStg].[PK_DnBOrgStg_1] [Dest], but it doesn't have JOIN criteria, which you can see as a warning in the Loops operator in your plan. I would suggest modifying the query to include some type of JOIN information. Also, this operation: [Expr1004] = Scalar Operator([OrgHub].[dbo].[Humana_OrgHub_ValidateDUNS]([OrgHub].[dbo].[BusinessRecords].[DUNS])) Is absolutely preventing any sort of index use. Basically, you need to get some indexes on the BusinessRecords table and you need to more clearly define the JOIN criteria between those tables so that there is some filtering done at the index level. Otherwise, it's trying to update everything.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.