question

shaka.d.virgo avatar image
shaka.d.virgo asked

SQL Insert error when merging codes in a table using UPDATE.

I'm trying to perform a merge in my application based in 2 tables: a) The 1st table has the "old code" and "new code" that you want to merge, for instance: Old_Code New_Code EMPL1 EMPLOYEE1 EMPL2 EMPLOYEE1 b) The 2nd table is a data table where we have records for the original employees, so it looks more or less like this: Employee Year Comment EMPL1 08 Hope 2008 is a great year. EMPL2 08 Recent hire. EMPL1 09 Another year. EMPL1 10 Maybe the last? EMPL2 10 Doing good. After running my SQL code, the 2nd table should look like this: Employee Year Comment EMPLOYEE1 08 Hope 2008 is a great year. EMPLOYEE1 09 Another year. EMPLOYEE1 10 Maybe the last? As you can see, I don't care about the other columns, so that's OK. This is how my current SQL code looks like: UPDATE Old_Table SET Employee = Change.New_Code FROM Merge_Table Change WITH (NOLOCK) WHERE Change.Company = 'ANY COMPANY' AND Change.Company = Old_Table.Company AND Change.Old_Code = Old_Table.Employee AND NOT EXISTS (SELECT 1 FROM Old_Table o WITH (NOLOCK) WHERE o.Company = Old_Table.Company AND o.Year = Old_Table.Year AND o.Employee = Change.New_Code) When I execute this code, I get an error: Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK_Old_Table'. Cannot insert duplicate key in object 'dbo.Old_Table'. According to my logic, that last bit in my SQL code above should have prevented duplicate inserts, but it's not working as I expected. Could anyone point what is wrong and how to fix this problem? Thanks in advance!
updateinserterrorduplicates
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Dave_Green avatar image
Dave_Green answered
Whilst without looking at the data it's impossible to be certain, I suspect that you are indeed checking that an existing record doesn't clash with one of your new ones, you have a many to one conversion here - you are taking EMPL1 and EMPL2 and making both EMPLOYEE1. So, assuming that the rest of the key fields are duplicated, you end up trying to run an update which makes both the same. Neither exist in your table before the update, so your "AND NOT EXISTS..." code doesn't trigger, but the update will fail as the records can't both be updated to the same key values. (It may be easier to think that SQL Server has been told to do both updates, and does the first, but then realises that the second won't work, so panics and refuses to do any of what you asked it to do. It's not exactly what happens, but it gives the general idea). I would suggest that you could run a select statement joining the Old_Table rows to the Change rows, grouping by the key fields and returning only those rows having count(*) >1 - this will give you only key fields that would be duplicated in conversion. You need to then have a strategy as to how to resolve them - as clearly only one can end up as the key record (and you need to tell SQL Server how to make that decision).
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks a lot, Dave and Oleg: I have finally found a way to deal with this, following Dave's suggestion.
0 Likes 0 ·
Oleg avatar image
Oleg answered
The problem you have is caused by the fact that you are trying to issue the update which should result in collapsing of multiple records into one. For example, 2 records for original EMPL1/08 and EMPL2/08 should result in existence of only one final record with EMPLOYEE1/08. This means that your script should consist of 2 parts: update the columns to the new values and then delete what now appears as a duplicate record to your table. It looks like your Old\_Table is clustered by EMPLOYEE and [YEAR] columns, and I should point out that under no conditions the values of the columns participating as a part of the clustered index should ever be modified because one of the most important requirements of the clustered index is that the values are ***never changing***. Additionally, because it would be easier to update the columns first and then delete the records which appear as duplicates, you can drop the constraint first because it will have to be rebuilt from scratch anyway, update the table, delete the duplicate records from there and then add the index back. Here is the script which will do it:

-- we need to update the values in the Employee column which will force
-- the engine to rebuild the index anyway, so lets drop it now explicitly
alter table dbo.Old_Table drop constraint PK_Old_Table;
go

-- update the Employee column with new values from Merge_Table and also
-- mark the "duplicate" records which need to be deleted from the table 
;with records as
(	
     select
        ot.Employee, ot.[Year], ot.Comment, mt.New_Code,
        row_number() over (partition by mt.New_Code, ot.[Year] 
            order by ot.Employee) N
        from dbo.Old_Table ot inner join dbo.Merge_Table mt
            on ot.Employee = mt.Old_Code
)
    update records 
    set 
        Employee = New_Code,
        Comment = case when N > 1 then 'TO BE DELETED' else Comment end;

-- now deleted the "duplicates", i.e. records which originally belonged to
-- different emplolyeee which was then merged into one	
delete from dbo.Old_Table where Comment = 'TO BE DELETED';
go

-- finally, restore the index
alter table dbo.Old_Table add constraint PK_Old_Table 
    primary key clustered (Employee, [Year]);
go

-- check the data:
select * from dbo.Old_Table;

Employee   Year Comment
---------- ---- --------------------------
EMPLOYEE1  08   Hope 2008 is a great year.
EMPLOYEE1  09   Another year.
EMPLOYEE1  10   Maybe the last?
Hope this helps, Oleg
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.