When attempting to perform fuzzy matching on name keys:
1. | Use the NORMALIZE() function |
2. | Apply a tailored substitution file to the NORMALIZE() function that removes common unnecessary salutations, and standardizes common abbreviations |
Notice that the substitution file entries shown below are upper cased and are missing any punctuation, as substitutions (replacements or removals) are performed only after the NORMALIZE() function first upper cases the data and removes any punctuation.
The substitution file is stored in the project folder and is a text file that can be easily created using NOTEPAD. For more detailed information on constructing an substitution file, see NORMALIZE().
For people’s names, create a substitution file for use in the NORMALIZE() function that:
• | removes standard salutations: for example MR, MS, MRS, or MISS |
• | Replaces titles with standard abbreviations: for example DOCTOR with DR |
An additional best practice for matching people’s names is to assess the first name separate from the last name (where possible). This will allow a smaller Damerau-Levenshtein distance to be used in the subsequent “near” and “similar” comparisons, thereby reducing false positives in the identified fuzzy matches.
For business names, create a substitution file for use in the NORMALIZE() function that:
• | removes the legal form for the company: for example LLP, LLC, LTD, LIMITED, INC, INCORPORATED, CORP, or CORPORATION |