(Created page with "UltraEdit supports Perl style regular expressions for search using the [http://www.boost.org/doc/libs/1_50_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html Boost C++...") |
|||
Line 259: | Line 259: | ||
See also: | See also: | ||
* [[Regular expressions]] | * [[Regular expressions]] | ||
+ | * [[Special_search_characters]] | ||
</div> | </div> |
UltraEdit supports Perl style regular expressions for search using the Boost C++ Libraries.
Note: in the following documentation, atom may refer to a single character, a marked sub-expression, or a character class.
In Perl regular expressions, all characters match themselves except for the following special characters:
Character | Meaning |
---|---|
. | Matches any single character except new lines |
^ | Matches start of line (anchor) |
$ | Matches end of line (anchor) |
* | Matches 0 or more of the preceding atom |
+ | Matches 1 or more of the preceding atom |
? | Matches 0 or 1 of the preceding atom |
[] | Matches any character in the set. For example [a-d] would match a, b, c, and d – but not e. |
() | Tags the enclosed atom for backreferencing |
{n} | Matches the previous atom n times. |
| | "or" operand. For example dog|cat would match both "dog" and "cat." |
\ | Escape character |
Marked sub-expressions
A section beginning ( and ending ) is a marked sub-expression. Whatever matched the sub-expression is split out in a separate field by the matching algorithms. Marked sub-expressions can also be repeated or referred to by a backreference.
Alternation
The | operator will match either of its arguments, so for example: abc|def will match either "abc" or "def". Parenthesis can be used to group alternations, for example: ab(d|ef) will match either of "abd" or "abef". Empty alternatives are not allowed (these are almost always a mistake), but if you really want an empty alternative use (?:) as a placeholder, for example:
Character sets
A character set is a bracket-expression starting with [ and ending with ], it defines a set of characters, and matches any single character that is a member of that set.
A bracket expression may contain any combination of the following:
Supported dcharacter class names (following the format [[:name:]])
Name | POSIX-standard | Description |
---|---|---|
alnum | Yes | Any alpha-numeric character. |
alpha | Yes | Any alphabetic character. |
blank | Yes | Any whitespace character that is not a line separator. |
cntrl | Yes | Any control character. |
d | No | Any decimal digit |
digit | Yes | Any decimal digit. |
graph | Yes | Any graphical character. |
l | No | Any lower case character. |
lower | Yes | Any lower case character. |
Yes | Any printable character. | |
punct | Yes | Any punctuation character. |
s | No | Any whitespace character. |
space | Yes | Any whitespace character. |
unicode | No | Any extended character whose code point is above 255 in value. |
u | No | Any upper case character. |
upper | Yes | Any upper case character. |
w | No | Any word character (alphanumeric characters plus the underscore). |
word | No | Any word character (alphanumeric characters plus the underscore). |
xdigit | Yes | Any hexadecimal digit character. |
Escapes
Any special character preceded by an escape matches itself. The following escape sequences are also supported (all synonyms for single characters):
Escape | Character |
---|---|
\a | '\a' |
\e | 0x1B |
\f | \f |
\n | \n |
\r | \r |
\t | \t |
\v | \v |
\b | \b (but only inside a character class declaration). |
\cX | An ASCII escape sequence - the character whose code point is X % 32 |
\xdd | A hexadecimal escape sequence - matches the single character whose code point is 0xdd. |
\x{dddd} | A hexadecimal escape sequence - matches the single character whose code point is 0xdddd. |
\0ddd | An octal escape sequence - matches the single character whose code point is 0ddd. |
\N{name} | Matches the single character which has the symbolic name name. For example \N{newline} matches the single character \n. |
"Single character" character classes
Any escaped character x, if x is the name of a character class shall match any character that is a member of that class, and any escaped character X, if x is the name of a character class, shall match any character not in that class. The following are supported by default:
Escape sequence | Equivalent to |
---|---|
\d | [[:digit:]] |
\l | [[:lower:]] |
\s | [[:space:]] |
\u | [[:upper:]] |
\w | [[:word:]] |
\D | [^[:digit:]] |
\L | [^[:lower:]] |
\S | [^[:space:]] |
\U | [^[:upper:]] |
\W | [^[:word:]] |
Word Boundaries
The following escape sequences match the boundaries of words:
\< | Matches the start of a word. |
\> | Matches the end of a word. |
\b | Matches a word boundary (the start or end of a word). |
\B | Matches only when not at a word boundary. |
For further information options on Perl regular please see:
See also: