You are here: Functions > NORMALIZE() > Examples

Examples

Standard Usage

Example 1:

NORMALIZE("1744-4812 Main St., Anywhere, USA")

 

yields

 

"1744 4812 MAIN ST ANYWHERE USA"

Example 2:

NORMALIZE("1744 - 4812 Main Street, Anywhere, USA")

 

yields

 

"1744 4812 MAIN STREET ANYWHERE USA"

Example 3:

NORMALIZE("177 Rué Paris")

 

yields

 

"177 RUE PARIS"

 

Specifying a Substitution File

If a substitution file name is specified then it must be accessed from the client computer (typically from the active project folder), and must specify substitutions that will be applied to the resulting string.

The substitution file is intended to be used to remove alternate valid representations of values within the data. For example, addresses may contain "Avenue" or "Ave", names may be "Bob" or "Rob" or "Robert". The substitution file allows you to standardize these on a single representation.

As mentioned, the substitution file name must be valid for the client computer that is being used. The name may be fully pathed, but if it is not pathed it will be assumed to be in the Analyzer project root directory. The specified substitution file must be surrounded by quotes. More than one substitution file may be specified, separated by commas within a single set of quotes.

The structure of the substitution file is as a simple text file, where each line contains two values, separated by a space or tab. The first value is the item to be replaced, and the second value is what it will be replaced with. For example, using a simple address substitution, the substitution file might contain:

STREET ST

ROAD RD

AVENUE AVE

Note: The entries are all in capitals. This is because the substitutions are applied as the last step of the NORMALIZE(), and so the data has already been upper-cased.

For efficiency reasons it is always better to substitute shorter values for longer ones, as in the example shown above. If you do wish to standardize on a longer representation, then the system will only allow room for up to two substitutions on any one line.

Using the above file as a substitution list,

NORMALIZE("1744-4812 Main Street, Anywhere, USA","addr.txt")¿

 

yields

 

"1744 4812 MAIN ST ANYWHERE USA"

Substitutions are applied on a "whole word" basis, so

NORMALIZE("1744-4812 Broadway, Anywhere, USA","addr.txt")¿

 

yields

 

"1744 4812 BROADWAY ANYWHERE USA"

even though "road" is contained in "Broadway". By default, the function does not perform substitutions within a string, only on whole words. You can enable substitutions within a string by starting the line in the substitution file with an asterisk (*). Therefore, if the substitution file contained:

STREET ST

*ROAD RD

AVENUE AVE

then STREET and AVENUE would only be replaced on a "whole word" basis, but "ROAD" would be replaced anywhere it was found, such as in the middle of "BROADWAY".

Note: To assist you with performing substitutions for street-types in addresses, contact Arbutus Technical Support to obtain a sample substitution file containing hundreds of standard US street-type abbreviation substitutions.

Use of the asterisk (*) can also be useful for specifying replacements sets for upper and lower case versions of individual international characters to ensure that a desired substitution is made in place of the standard international character substitution performed by the Normalize() function. For example, Normalize() will, by default, replace the international characters “æ” and “Æ” with an “E”. By placing the lines *Æ A and *æ A in the substitution file, you can override the default international character substitution for these characters.

Additionally, the asterisk (*) can be used to avoid the separation of words connected by punctuation (like dashes). For example, placing *- in the substitution file with no replacement value will override the default behavior of replacing punctuation with blank spaces, thus allowing the substitution file to then simply remove the punctuation without replacement. Using this example, "upper-case" would become "UPPERCASE" rather than "UPPER CASE".

If the target or substitution contain multiple words separated by blanks, then the multiple word target or substitution must be surrounded by double quotes:

“NEW YORK” NY

NY “NEW YORK”

“AVENUE SOUTH WEST” “AVE SW”

The substitution file also supports specification of single items instead of pair sets. So if you want to remove an item without replacement, you can simply enter the item to remove in the substitution file without specifying a replacement.

Note: When removing an item without replacement, if the item is found in the source string, any trailing spaces encountered after the specified item are also removed.

Specifying Extra Characters

If extra characters are specified, then they will be included as is, just like the alphanumeric characters, so

NORMALIZE("1744-4812 Main St., Anywhere, USA", "", "-")

 

yields

 

“1744-4812 MAIN ST ANYWHERE USA”

If you include a lower case character as an extra character it will not be upper cased - you must then use the substitution table to upper case the extra character (i.e., an entry like: e E).