Fuzzy Matching for Names

You are here: Fuzzy Matching > Fuzzy Matching for Names

Fuzzy Matching for Names

When attempting to perform fuzzy matching on name keys:

Use the NORMALIZE() function

Apply a tailored substitution file to the NORMALIZE() function that removes common unnecessary salutations, and standardizes common abbreviations

Notice that the substitution file entries shown below are upper cased and are missing any punctuation, as substitutions (replacements or removals) are performed only after the NORMALIZE() function first upper cases the data and removes any punctuation.

The substitution file is stored in the project folder and is a text file that can be easily created using NOTEPAD. For more detailed information on constructing an substitution file, see NORMALIZE().

People’s Names

For people’s names, create a substitution file for use in the NORMALIZE() function that:

•

removes standard salutations: for example MR, MS, MRS, or MISS

•

Replaces titles with standard abbreviations: for example DOCTOR with DR

An additional best practice for matching people’s names is to assess the first name separate from the last name (where possible). This will allow a smaller Damerau-Levenshtein distance to be used in the subsequent “near” and “similar” comparisons, thereby reducing false positives in the identified fuzzy matches.

Business Names

For business names, create a substitution file for use in the NORMALIZE() function that:

•

removes the legal form for the company: for example LLP, LLC, LTD, LIMITED, INC, INCORPORATED, CORP, or CORPORATION