You are here: Functions > REGEXREPLACE()

REGEXREPLACE()

The REGEXREPLACE() function allows you to replace all instances of character strings matching a regular expression with a specified character string.

Regular expressions are powerful and flexible search strings that combine literal characters and meta-characters, which are special characters that perform a wide variety of search operations. See Regular Expressions for specific details on the full list of special meta-characters.

Note: The current implementation of regular expressions in Analyzer does not fully support searching languages other than English.

Constructing regular expressions can be tricky, especially if you are new to using regular expression syntax. You may be able to achieve your search goals using the simpler REPLACE() function. If the search requirements for your replacement exceeds the capabilities of the REPLACE() function, regular expressions provide almost unlimited flexibility in constructing complex search strings.

If you want to search for matching patterns with a new string using regular expressions, see the REGEXFIND() function.

Function Format

REGEXREPLACE(string, pattern, replacement_string)

string - a string (character field, expression, or literal) containing a pattern that you want to replace.
pattern - the pattern (regular expression) searched for in the specified string. The pattern parameter can contain literal characters, meta-characters, or a combination of the two. Literal characters include all alphanumeric characters, some punctuation characters, and blanks. For a list of meta-characters, see Regular Expressions. The pattern parameter must be enclosed in quotation marks.

Note: The search is case-sensitive, which means that uppercase and lowercase alpha characters must be explicitly specified.

replacement_string - The replacement string used to replace all values matching the pattern in the specified string. The replacement string can contain literal characters, groups of characters from the original string (using the $int element), or a combination of the two. The replacement_string parameter must be enclosed in quotation marks.

Examples

The REGEXREPLACE( ) function allows you to construct a regular expression to find matching patterns in data, and to replace the matching values with a new string.

REGEXREPLACE(character_field, "\s+", " ")

standardizes spacing in character data by replacing one or more spaces between text characters with a single space.

Note: This could also be achieved using the COMPACT() function.

The search portion of the REGEXREPLACE() function is identical to the REGEXFIND() function. For detailed information about the search capability common to both functions, see Regular Expressions.

Specific to the REGEXREPLACE() function, you can also use the $int element to replace characters with themselves, which allows you to preserve the meaningful parts of data, while standardizing or omitting surrounding or intermixed data. Several examples using telephone numbers and names appear below. To use the $int element you must first create groups by using parentheses in the pattern parameter.

You can avoid sequential character matching, and replace sub-strings regardless of their position in relation to one another, by nesting REGEXREPLACE() functions. For example:

REGEXREPLACE(REGEXREPLACE("123ABC","\d","9"),"[A-Z]","X") = 999XXX

REGEXREPLACE(REGEXREPLACE("1A2B3C","\d","9"),"[A-Z]","X") = 9X9X9X

Each instance of REGEXREPLACE() makes a pass through the target string, starting with the innermost instance.

When you use the REGEXREPLACE() function to create a computed field, the computed field length is identical to the original field length. If the replacement string length exceeds the target string length, overall string length increases, which results in truncation if the computed field cannot accommodate the increased string length. Characters that trail the target string are truncated first, followed by trailing replacement string characters.

The examples below illustrate how truncation can result:

String

Pattern

New_String

Field length

Result

x123x

123

A

5

xAx

x123x

123

ABC

5

xABCx

x123x

123

ABCD

5

xABCD

x123x

123

ABCDE

5

xABCD

x123x

123

ABCDE

6

xABCDE

x123x

123

ABCDE

7

xABCDEx

In computed fields, to avoid truncation, use the SUBSTRING() function to ensure a field length that is sufficiently long enough to accommodate the final result. For example:

REGEXREPLACE("x123x","123","ABCDE") = xABCD

but,

REGEXREPLACE(SUBSTR("x123x",1,10),"123","ABCDE") = xABCDEx

The following table lists some additional examples of using the REGEXREPLACE() function:

Example

Return value

REGEXREPLACE(character_field, "\s+", BLANKS(1))

Returns the data in character field with the spacing between the words standardized on a single space.

Using the BLANKS() function in the new_string parameter, rather than a literal space, makes spaces easier to read and less likely to be overlooked.

You could also use the easier COMPACT() function.

REGEXREPLACE(Telephone_Number, ".*(\d{3})[\s-\.\)]*(\d{3})[\s-\.]*(\d{4})", "($1) $2-$3")

Returns the numbers within a Telephone_Number field with standardized formatting:

(123) 456-7890

REGEXREPLACE(Full_Name, "^(\w+),(\s\w+)(\s\w+)?(\s\w+)?", "$2$3$4 $1")

Returns the names within the Full_Name field in their regular order: First (Middle) (Middle) Last:

"Wayne, John Allan" becomes "John Allan Wayne"

Note: Name data can present various complications, such as apostrophes in names. Accounting for all of the variations in name data typically requires more complex regular expressions than the one provided in this example.