Regular Expressions
A regular expression is a formula for matching strings that follow some pattern. Regular expressions are used throughout the application for locating and extracting character strings from incoming text. Regular expressions are made up of normal characters and meta characters. Normal characters include upper and lower case letters and digits. The meta characters have special meanings and are described in detail below. In the simplest case, a regular expression looks like a standard search string. For example, the regular expression "testing" contains no meta characters. It will match "testing", "123testing", "Testing123" and "Testing". To really make good use of regular expressions it is critical to understand meta characters. The table below lists meta characters and a short explanation of their meaning. For a more detailed description of regular expressions visit ... www.regular-expressions.info

Character

Description

.

Matches any single character. For example the regular expression r.t would match the strings rat, rut, r t, but not root

$

Matches the end of a line. For example, the regular expression weasel$ would match the end of the string "He's a weasel" but not the string "They are a bunch of weasels.

^

Matches the beginning of a line. For example, the regular expression ^When in would match the beginning of the string "When in the course of human events" but would not match "What and When in the"

*

Matches zero or more occurrences of the character immediately preceding. For example, the regular expression .* means match any number of any characters. 

\

This is the quoting character, use it to treat the following character as an ordinary character. For example, \$ is used to match the dollar sign character ($) rather than the end of a line. Similarly, the expression \. is used to match the period character rather than any single character. 

[ ] 

[c1-c2]

[^c1-c2]

Matches any one of the characters between the brackets. For example, the regular expression r[aou]t matches rat, rot, and rut, but not ret. Ranges of characters can specified by using a hyphen. For example, the regular expression [0-9] means match any digit. Multiple ranges can be specified as well. The regular expression [A-Za-z] means match any upper or lower case letter. To match any character except those in the range, the complement range, use the caret as the first character after the opening bracket. For example, the expression [^269A-Z] will match any characters except 2, 6, 9, and upper case letters. 

|

Or two conditions together. For example (him|her) matches the line "it belongs to him" and matches the line "it belongs to her" but does not match the line "it belongs to them."

+

Matches one or more occurrences of the character or regular expression immediately preceding. For example, the regular expression 9+ matches 9, 99, 999.

\d

Matches a numeric character. Same as [0-9].

\D

Matches a non-numeric character. Same as [^0-9].

\w

Matches an alphanumeric character. Same as [a-zA-Z0-9].

\W

Matches a non-alphanumeric character [^a-zA-Z0-9].

\s

Matches any white space character (space, tab, new line, etc.).

\S

Matches any non-white space character.

\eol

Matches any new line (either return or line feed) character.

\n

Matches a new line (line feed).

\r

Matches a return.

\xnn

Matches the character defined by the hexadecimal value eg \x0A would search for a linefeed character.

(abc)

Used to create sub expressions. Remembers the match for later back references. Referenced by replacement patterns that use \1, \2, etc.

\t

Matches a tab character.


NB. Great care must be taken when formulating regular expressions because a number of characters have special meaning. For example the bracket characters '()' are used to create sub expressions consequently searching for the string "Saturdays Run (19th)" would return incorrect results. In order to use the special characters within search string the characters must be 'escaped' or 'quoted' by preceding them with the '\' character. Thus, the correct search string would be "Saturdays Run \(19th\)"