UltraEdit supports Perl style regular expressions for search using the Boost C++ Libraries.
Note: in the following documentation, atom may refer to a single character, a marked sub-expression, or a character class.
In Perl regular expressions, all characters match themselves except for the following special characters:
Character | Meaning |
---|---|
.
|
Matches any single character except new lines |
^
|
Matches start of line position (anchor) |
$
|
Matches end of line position (anchor) |
*
|
Matches 0 or more of the preceding atom |
+
|
Matches 1 or more of the preceding atom |
?
|
Matches 0 or 1 of the preceding atom |
[]
|
Matches any character in the set. For example [a-d] would match a, b, c, and d – but not e.
|
()
|
Tags the enclosed atom for backreferencing |
{n}
|
Matches the previous atom n times |
|
|
"or" operand. For example dog|cat would match both "dog" and "cat"
|
\
|
Escape character |
Marked sub-expressions
A section beginning with a (
and ending with a )
is a marked sub-expression. Whatever matches this sub-expression is split into a separate field by the matching algorithms. Marked sub-expressions can also be repeated or referred to by a backreference.
Alternation
The |
operator will match either of its arguments, so for example: abc|def
will match either "abc" or "def". Parenthesis can be used to group alternations, for example: ab(d|ef)
will match either of "abd" or "abef". Empty alternatives are not allowed (these are almost always a mistake), but if you really want an empty alternative use (?:)
as a placeholder, for example:
|abc
is not a valid expression, but(?:)|abc
is and is equivalent, also the expression:(?:abc)??
has exactly the same effect.Character sets
A character set is a bracketed expression starting with <code>[</code> and ending with <code>]</code> which defines a set of characters, and matches any single character that is a member of that set.
This bracketed expression may contain any combination of the following:
[abc]
, will match any of the characters "a", "b", or "c".[a-c]
will match any single character in the range "a" to "c'. By default, for POSIX-Perl regular expressions, character "x" is within the range "y to z", if it collates within that range; this results in locale specific behavior. ^
character, then it matches the complement of the characters it contains, for example [^a-c]
matches any character that is not in the range "a-c".[[:name:]]
matches the named character class "name", for example [[:lower:]]
matches any lower case character. You can see all available character class names below, or on Boost's Perl library online documentation.Supported character class names (following the format [[:name:]]
)
Name | POSIX-standard | Description |
---|---|---|
alnum | Yes | Any alpha-numeric character |
alpha | Yes | Any alphabetic character |
blank | Yes | Any whitespace character that is not a line separator |
cntrl | Yes | Any control character |
d | No | Any decimal digit |
digit | Yes | Any decimal digit |
graph | Yes | Any graphical character |
l | No | Any lower case character |
lower | Yes | Any lower case character |
Yes | Any printable character | |
punct | Yes | Any punctuation character |
s | No | Any whitespace character |
space | Yes | Any whitespace character |
unicode | No | Any extended character whose code point is above 255 in value |
u | No | Any upper case character |
upper | Yes | Any upper case character |
w | No | Any word character (alphanumeric characters plus the underscore) |
word | No | Any word character (alphanumeric characters plus the underscore) |
xdigit | Yes | Any hexadecimal digit character |
Escapes
Any special character preceded by an escape matches itself. The following escape sequences are also supported (all synonyms for single characters):
Escape | Character |
---|---|
\a
|
'\a' |
\e
|
0x1B |
\f
|
\f |
\n
|
\n |
\r
|
\r |
\t
|
\t |
\v
|
\v |
\b
|
\b (but only inside a character class declaration). |
\cX
|
An ASCII escape sequence - the character whose code point is "X" % 32 |
\xdd
|
A hexadecimal escape sequence - matches the single character whose code point is 0xdd. |
\x{dddd}
|
A hexadecimal escape sequence - matches the single character whose code point is 0xdddd. |
\0ddd
|
An octal escape sequence - matches the single character whose code point is 0ddd. |
\N{name}
|
Matches the single character which has the symbolic name name. For example \N{newline} matches the single character \n. |
"Single character" character classes
Any escaped character "x", if "x" is the name of a character class, will match any character that is a member of that class. Any escaped character "X", if "x" is the name of a character class, shall match any character not in that class. The following are supported by default:
Escape sequence | Equivalent to |
---|---|
\d
|
[[:digit:]] |
\l
|
[[:lower:]] |
\s
|
[[:space:]] |
\u
|
[[:upper:]] |
\w
|
[[:word:]] |
\D
|
[^[:digit:]] |
\L
|
[^[:lower:]] |
\S
|
[^[:space:]] |
\U
|
[^[:upper:]] |
\W
|
[^[:word:]] |
Assertions
Besides ^
and $
, Perl regular expressions support the following zero-width assertions:
\<
|
Matches the start of a word. |
\>
|
Matches the end of a word. |
\b
|
Matches a word boundary (the start or end of a word). |
\B
|
Matches only when not at a word boundary. |
\A
|
Matches beginning of the file. |
\Z
|
Matches position of last non-newline character in the file. |
\z
|
Matches end of the file. |
For further information options on Perl regular please see:
See also: