You are here: Functions > NORMALIZE()

NORMALIZE()

NORMALIZE() will take a string and remove the common characteristics that can cause manually entered strings to differ.

Specifically, it:

• removes all non-alphanumeric characters except blanks (like commas, dashes, periods, etc.), replacing contiguous removed characters with a single blank
• replaces all International characters (containing accents, etc.) with the English base character (่ becomes E, for example). With respect to replacing international characters, in German, the "฿" in "Stra฿e" is replaced with "SS" to yield "STRASSE".
• trims leading blanks
• compacts contiguous blanks into a single blank
• upper cases the remaining data
• optionally, allows specification of non-alphanumeric and foreign characters to be retained
• optionally, allows specification of one or more substitution files to standardize alternate valid representations of values within the data (e.g. “Rd” for “Road”, “St” for “Street“)

Use NORMALIZE() to compare two strings (like names or addresses) for equality or similarities. Particularly useful for use in the DIFFERENCE(), NEAR() or SIMILAR() functions. To better understand using NEAR and SIMILAR options see NEAR() versus SIMILAR() Functions. For more information on the DIFFERENCE() function see DIFFERENCE().

For specific examples of using NORMALIZE() syntax (basic syntax as well as using the optional substitution file or specifying extra characters), see Examples.

Alternatively, you can use the SORTNORMALIZE() function which has the same normalizing capabilities but also takes the final result and re-arranges the remaining whole words (including number sets) separated by blanks in descending order. This function is particularly useful for harmonizing addresses for subsequent comparison. For more information see SORTNORMALIZE().

Always use NORMALIZE() when comparing strings that contain spaced abbreviations. For example: given two company name fields, one containing “I B M” and the other containing “IBM”, the NORMALIZE() function result will maintain the spaced abbreviation (“I B M”) while the SORTNORMALIZE() function will re-arrange the spaced abbreviation as “M I B”. Maintaining the spaced abbreviation will yield more comparable data as “I B M” is much more similar to “IBM” than is the reversed “M I B”.

Finally, you can use the NORMALIZE() function and its substitution tables to easily facilitate an exclusive search for whole words. For more information see Whole Word Searches.

Function Format

Examples