# question

## Join tables based on similar terms

I have a table of part numbers of many million. I would like to join it to another reference table where the part numbers can be slightly different. For example I could have `ABC123` and `ABC321` which (believe it or not!) can be the same thing! More specifically I want to be able to join a part number similar to `H7C-2275Q-VT 6CE` and `H7C-2275Q-VT EC6` where the last three characters are different. I would also like to be able to match on differing percentages of match so `ABCD E123 45` would match with `ABCD E123` for example. I don't mind using a routine and producing a static table. Any ideas? I must admit this has got me a bit stumped!

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Going for some matching rules here: I see that 6CE would match EC6 which contains the same characters. Would it still match if it was 6CE and 4DA? How should it match if you have 6CE, EC6, 6EC and E6C? Can you go into more detail on the percentage piece? It looks like they can still match even if one of the numbers doesn't have the final characters at all. If so, how would that weigh against another number that had the final characters (maybe the same, maybe in a different order, or with some different characters)?
0 Likes 0 ·
Remember to vote up the questions that were helpful and mark any that solved the problem. If you need clarification or additional information use the comments.
0 Likes 0 ·

·
I've used the tsql impementation of the Jaro Winkler algorithm for stuff like this before. It's very slow - the implementation that I've used is full of nested loops, but it can be useful. It compares two combinations of characters and comes up with a score between (I think) 0.5 being completely random and 1 being identical. It, from memory takes into account missing characters, and transpositions amongst other things. so comparing "123-AGEG-GHT" to "123-AGED-GHR" would score very close to 1 and comparing "123-AGEG-GHT" to "gi464oeern" would score near 0.5. See [ http://www.sqlservercentral.com/articles/Fuzzy+Match/65702/][1] [1]: http://www.sqlservercentral.com/articles/Fuzzy+Match/65702/

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

You could look at Soundex function to see if matches your matching rules or possibly a regular expression if you define under which conditions a match could occur. An alternative and possible more efficient option would be to create a third table that had a 1 to many link from the main parts table with every match that you expect then join this to the other table.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.