question

Sunshine15 avatar image
Sunshine15 asked

Cleanup duplicates on both tables

I have Table1 and Table2 joined on ID. I want to find out the records WHERE t1.FinancialServiceName = t2.FinancialServiceName AND t1.AccountNumber = t2.AccountNumber The problems are that FinancialServiceName and AccountNumber are not exactly match on both tables. For example, FinancialServiceName like American Bank, could write as AMER BK, AM BANK, AMERICAN BK... A lot of duplicates for each FinancialServiceName on both tables. I want to have one name for each financial service. Also these two tables are from different sources and updated through ETL process daily. It means that new data coming in has the same problem. What's the best way to cleanup FinancialServiceName field on both tables so that I could get thousands of records? Many thanks.
sql
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

You have some good answers below. This web site runs by you voting and marking the best answers. For each helpful answer below, indicate this by clicking on the thumbs up symbol next to those answers. If any one answer lead to a solution, indicate this by clicking on the check mark next to that answer. This helps indicate your solution for anyone else with the same question.
0 Likes 0 ·
David Wimbush avatar image
David Wimbush answered
It sounds like you need to create a translation table. Something like this: ShortName ProperName ------------ ---------------- AMER BK American Bank AM BANK American Bank AMERICAN BK American Bank FRENCH BK French Bank FRENCH BANK French Bank Then you could look them up to see what the short names are equivalent to. You can do a left join to identify new data with short names that aren't known yet so you can resolve them and add them to your table.
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

You could do it with a key but I don't see why you need one because you already have values that tie them together. It depends how much you can change things. Ideally you would have a FinancialService table with ID, Name etc and a FinancialServiceID in tables 1 & 2.
1 Like 1 ·
Thanks for your answer. If I create a new Table3 to store ShortName and ProperName. I still need to add a key on both tables for join condition to update Table1, right? Update t1 set t1.FinancialServiceName =t3.ProperName from Table1 t1 join Table3 t3 on t1.Key = t3.Key
0 Likes 0 ·
Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
David Wimbush has a good answer for fixing this. If you have a constant flow of incoming data, with the same quality issues, I would recommend looking at Master Data Services (MDS, included in SQL Server Enterprise Edition and BI edition). With MDS, you can setup a knowledge base of corrections to incoming data. That way, once you have identified a shortname which should be corrected to the full name, you update the knowledge base with this change and then have MDS correct the data for you (alternatively suggest corrections which you manually approve). I have only played around with MDS, not implemented it in a system, but I can definitely see the strenghts of using it, especially when you load data from external parties and want the data to be correct before it enters your own systems.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks. I am going to look at MDS.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.