You are here: Functions > SIMILAR()

SIMILAR()

SIMILAR() supports "fuzzy" comparisons between primarily numeric reference data, like invoice numbers. It is specifically designed to account for differences that commonly occur as a result of OCR or manual data entry.

It takes a pair of values, and applies a proprietary transformation to match strings that "look" similar. Where the values are numbers or dates, they are first converted to character data. The two transformed values are then compared using the Damerau-Levenshtein algorithm.

If the Damerau-Levenshtein distance between Field1 and Field2 is equal to or less than the specified difference then SIMILAR() returns true, otherwise it returns false.

Unlike NEAR(), prior to comparing the values, SIMILAR() transforms its parameters so that the effects of common data errors are eliminated. For example all blanks, leading zeros and punctuation are removed, data is made upper case and similar looking characters (e.g. 1 and I, or 0 and O) are matched.

Note: Because the input values are transformed, some false positives may result, however fewer likely positives will be missed.

SIMILAR() is also ideally suited for testing transposed digits within numbers. This is because the numbers are converted to strings and then compared using standard fuzzy techniques. Any two adjacent digits that are transposed, regardless of position will show up as a single difference.

Note: The fields or values being compared cannot be logical.

For more information on the Damerau-Levenshtein distance, see the function DIFFERENCE().

For information on the NEAR() function see NEAR().

To better understand using Near and Similar options, see NEAR() versus SIMILAR() Functions.

Function Format

SIMILAR(field1,field2,difference)

SIMILAR() returns true if the Damerau-Levenshtein distance between Field1 and Field2 is equal to or less than the specified difference, otherwise SIMILAR() returns false.

For more information on the Damerau-Levenshtein distance, see the function DIFFERENCE().

Note: If the difference is omitted, it is assumed to be 1.

Note: Either Field1 or Field2 can be expressed as a constant.

Examples

SIMILAR(1093.00,1079.00,1) = F

SIMILAR(1093.00,1079.00,2) = T

SIMILAR(‘20110130‘,‘20110215‘,2) = F

SIMILAR(‘20110130‘,‘20110215‘,3) = T

SIMILAR("I1093","11079",1) = F

SIMILAR("I1093","11079",2) = T

SIMILAR("000123","123",0) = T

SIMILAR("123 Main St.","123 MAIN ST",0) = T