# Fuzzy Grouping SSIS - How to Run Either/Or Logic Against Same Data Set,

 0 I have a data set (approximately 4 million records) with the following fields: -Id -First Name -Last Name -Street -City -State -Zip -Email The total confidence score needs to be greater than or equal to 75%, and the Last Name field must have a minimum confidence of 80%. The following rules also need to be applied to the match process: -If First Name has a length less than or equal to 7 characters, then the minimum confidence must be 90%; if First Name has a length greater than 7 characters, then the minimum confidence can be 80%. -Street and City must have a minimum confidence of 90% OR Email must be an exact match (100%). I have seen numerous SSIS examples where a Conditional Split was used to break up a data set by State, for example, but then the separate Fuzzy Grouping tasks used the same overall match logic. Breaking up by State is straightforward, but the two rules above amount to a more complicated either/or logic (Street and City greater than or equal to 90% OR Email equals 100%). What would be the best way in Fuzzy Grouping to process this kind of either/or logic against the same data set? Any advice would be greatly appreciated! more ▼ asked Jul 18, 2012 at 09:44 PM in Default Gawain41 0 ● 1 ● 1 ● 1 add new comment (comments are locked) 10|1200 characters needed characters left ▼ Viewable by all users

2 answers: sort voted first
 0 I think the Conditional Split is still the way to go but in the Condition you will have to build it up to show what you need. Or a simplified route may be to create views on the database based on your criteria and work with the views individually rather than as a whole. It's not as neat a solution but may help solve your problems. more ▼ answered Jul 20, 2012 at 10:49 AM Mrs_Fatherjack 4.7k ● 60 ● 62 ● 66 add new comment (comments are locked) 10|1200 characters needed characters left ▼ Viewable by all users
 0 I got to thinking even after posting my question that a Conditional Split might still be the way to go. The problem is that some of the conditions depend on having already run Fuzzy Grouping (e.g., Email equals 100% confidence). I'm wondering if the Fuzzy Grouping could be processed on the same data set sequentially? For example, I could create a task to process the Last Name according to the first criterion (minimum confidence of 80%), then take that output and create a task to process on the either/or variations of First Name, and so on, each time "adding" to the Fuzzy Grouping output. Once each task was processed, the next task could use a view, as you suggested, to extract the right data for the next run.Thank you for the quick response! more ▼ answered Jul 20, 2012 at 10:47 PM Gawain41 0 ● 1 ● 1 ● 1 add new comment (comments are locked) 10|1200 characters needed characters left ▼ Viewable by all users

### New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

### Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x941
x3

asked: Jul 18, 2012 at 09:44 PM

Seen: 1561 times

Last Updated: Jul 20, 2012 at 10:47 PM