Fuzzy Matching Tools

You are here: Fuzzy Matching > Fuzzy Matching Tools

Fuzzy Matching Tools

Analyzer provides a number of useful commands and functions that can be used in isolation or in combination to support a variety of fuzzy matching scenarios. Whether you are looking for fuzzy matches within the same table or between tables, Analyzer enables users to choose and combine these tools in order to customize the fuzzy matching to meet their needs.

The following commands can be used to facilitate fuzzy matching between tables or within the same table:

Command	Type of Comparison	Objective
Join	Matched Many-to-Many	To combine records from two tables lacking a common key but where there is a reasonable indirect key like zip code or state
Join	Many-to-Many	To combine records from two tables lacking a common key or reasonable indirect key
Duplicates	Same-Same-Different	To test for duplicates within the same table where the value of the last key field is different
Duplicates	Same-Same-Near	To test for duplicates within the same table where the value of the last key field (numeric, date or character) is near, but not necessarily exactly the same, based on a specified level of difference appropriate for the field type
Duplicates	Same-Same-Similar	To test for duplicates within the same table where the value of the last key field when harmonized (numeric, date or character) is similar, but not necessarily exactly the same, based on a specified Damerau-Levenshtein distance

The following functions can be used to facilitate fuzzy matching:

Function	Behavior
ARRANGE()	Re-arranges characters, or optionally whole words, within a string in descending order based on the host machine’s character set
DIFFERENCE()	Compares two strings and returns a positive whole number. A zero indicates that two strings are identical, a small number indicates that two strings are very similar. A value equal to the length of the longer string indicates that two strings are different
NEAR()	Tests whether the difference between two fields falls within a specified difference (used on pairs of numeric, date or character fields)
NORMALIZE()	Replaces all contiguous non-alphanumeric characters (except blanks) with a blank, replaces all international characters with English equivalents, trims leading blanks, compacts all contiguous blank spaces into a single blank space, and upper cases the entire string; also allows option to retain specified characters and to make text substitutions
SIMILAR()	Tests whether the difference between two fields falls within the specified Damerau-Levenshtein distance (used on pairs of numeric, date or character fields) Prior to testing the difference, this function takes a pair of character strings (or converts a specified pair of numeric or date values to character strings), upper cases the strings, removes any punctuation, and matches similar looking characters (e.g., 1 for I, 0 for O)
SORTNORMALIZE()	Replaces all contiguous non-alphanumeric characters (except blanks) with a blank, replaces all international characters with English equivalents, trims leading blanks, compacts all contiguous blank spaces into a single blank space, and upper cases the entire string; also allows option to retain specified characters and to make text substitutions. Finally, re-arranges remaining result in descending order of whole words based on using any remaining blanks as word separator
SOUNDEX()	Returns a four-character value for a string, which can be compared with the value of another string to see if both strings are phonetically similar using the Soundex algorithm
SOUNDSLIKE()	Indicates whether two strings sound phonetically alike using the Soundex algorithm
FIND()	Searches for a string inside one or more fields or the entire record. Also supports Google-type searches for a literal
LISTFIND()	Searches for a list of strings inside one or more fields or the entire record
INCLUDE() or EXCLUDE()	Includes or excludes specified characters (only digits, only alphabetic characters, etc.)
LEFT() or RIGHT()	Isolates specific number of leading or trailing characters
SUBSTRING()	Isolates specific number of characters with specified starting point in string
MAP() and FORMAT()	Measures a string’s format against a desired format or determines the format of a character string (e.g., zip code or phone number)