You are here: Fuzzy Matching > Fuzzy Matching Overview

Fuzzy Matching Overview

Fuzzy matching can be computationally intensive, whether you are applying functions to data to harmonize keys, using fuzzy matching options in the Duplicates command, or using the Join command to bring data from multiple tables together to test for fuzzy matches.

Here is a summary of the most common critical steps that make fuzzy matching more efficient and effective:

 

Key Steps

Analyzer Feature

Objective

1

Harmonize key fields to improve fuzzy matching

Use NORMALIZE() on name fields and SORTNORMALIZE() on address fields.

Use the substitution file option for either function to specify standardized abbreviations and to remove needless salutations, titles, corporate incorporation titles, etc.

To harmonize data for more effective fuzzy comparisons by standardizing and focusing on the most meaningful data

Harmonized keys with extraneous data removed will increase the precision and likelihood of the fuzzy matching results

2

Limit the size of the data to be assessed for fuzzy matches

Once harmonized, filter and extract data into a more focused data set

Exclude unnecessary columns of data

To reduce the size of data to be compared, and to resolve computed fields used to harmonize the data, so that subsequent fuzzy matching will most efficient

 

Choose either Step 3.1 or 3.2

 

 

3.1

Option 1:

Combine tables if data to be matched spans multiple tables

Join command

- Matched Many-to-Many

- Many-to-Many

To combine records from two tables (where possible, on a common key or reasonable indirect attribute). Filter the Join using either the NEAR() or SIMILAR() functions on the harmonized name or address fields from the primary and secondary tables to ensure the smallest possible joined table that contains the most likely fuzzy matches

3.2

Option 2:

Perform duplicates tests if fuzzy data is contained in a single table

Duplicates command

- Same-Same-Different

- Same-Same-Near

- Same-Same-Similar

To test for duplicates within the same table where the value of the last key (harmonized) field is different, near or similar