Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Regular Expressions - Examples


May 28, 2021 Regular expression


Table of contents


Regular Expression - Example

Simple expression

The simplest form of a regular expression is a single normal character that matches itself in the search string. F or example, a single-character pattern, such as A, always matches the letter A, regardless of where it appears in the search string. Here are some examples of single-character regular expression patterns:

/a/
/7/
/M/

Many single characters can be combined to form large expressions. For example, the following regular expressions combine single-character expressions: a, 7, and M.

/a7M/

Note that there are no series operators. You only need to type another character after one character.

Character matching

A period (.) matches a variety of printed or non-printed characters in a string, with the exception of one character. T his exception is the line break. The following regular expressions match aac, abc, acc, adc, and so on, as well as a1c, a2c, a-c, and a-c:

/a.c/

To match a string that contains the file name, and the period (.) is part of the input string, add a backslash character before the period in the regular expression. For example, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only let you match "any" individual characters. Y ou may need to match specific character groups in the list. For example, you might want to find chapter titles in numeric numbers (Chapter 1, Chapter 2, and so on).

The parenthesis expression

To create a list of matching character groups, place one or more individual characters in square brackets (and ). W hen characters are enclosed in parentheses, the list is called a parenthesis expression. A s in any other position, a normal character represents itself in parentheses, that is, it matches itself once in the input text. M ost special characters lose their meaning when they appear within a parenthesis expression. However, there are some exceptions, such as:

  • If the character is not the first item, it ends a list. To match the characters in the list, put it first, immediately after the beginning.
  • The character continues as an escape character. To match the character, use the character.

The characters enclosed in the parenthesis expression match only a single character that is at that position in the regular expression. The following regular expressions match Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/Chapter [12345]/

Note that the position of the word Chapter and the spaces that follow is fixed relative to the characters in the parentheses. T he parenthesis expression specifies only a character set that matches the position of a single character immediately following the word Chapter and space. This is the ninth character position.

To use a range instead of the character itself to represent a matching character group, use a hyphen (-) to separate the start and end characters in the range. T he character values of a single character determine the relative order within the range. The following regular expression contains a range expression that is equivalent to the list in parentheses shown above.

/Chapter [1-5]/

When you specify a range in this way, both the start value and the end value are included in the range. Note that it is also important that the start value must precede the end value in Unicode sort order.

To include hyphens in a parenthesis expression, take one of the following methods:

  • Escape it with a backslash:
    [\-]
  • Place the hyphen at the beginning or end of the parenthesis list. T he following expression matches all lowercase letters and hyphens:
    [-a-z]
    [a-z-]
    
  • Creates a range in which the start character value is less than the hyphen, and the end character value is equal to or greater than the hyphen. T he following two regular expressions meet this requirement:
    [!--]
    [!-~]
    

To find all characters that are not in the list or range, place the insertion symbol (') at the beginning of the list. I f the inserted character appears anywhere else in the list, it matches itself. The following regular expression matches any number and character other than 1, 2, 3, 4, or 5:

/Chapter [^12345]/

In the example above, the expression matches any number and character other than 1, 2, 3, 4, or 5 at the ninth position. This way, for example, Chapter 7 is a match and Chapter 9 is a match.

The above expression can be represented by a hyphen (-):

/Chapter [^1-5]/

A typical use of a parenthesis expression is to specify a match for any capital or lowercase letter or any number. The following expression specifies such a match:

/[A-Za-z0-9]/

Replace and group

Replace using | c haracters to allow a choice between two or more replacement options. F or example, you can extend a chapter title regular expression to return a match that is broader than the chapter title. H owever, this is not as simple as you might think. R eplace the match | The largest expression on either side of the character.

You might think that the following expression match appears at the beginning and end of the line, followed by a chapter or Section with one or two numbers:

/^Chapter|Section [1-9][0-9]{0,1}$/

Unfortunately, the regular expression above either matches the word Chapter at the beginning of the line or the word Section at the end of the line and any numbers that follow. I f the input string is Chapter 22, the above expression matches only the word Chapter. If the input string is Section 22, the expression matches Section 22.

To make regular expressions easier to control, you can use parentheses to limit the scope of the replacement, that is, to ensure that it applies only to two words, Chapter and Section. H owever, parentheses are also used to create subexpressions and may be captured for later use, as described in the section on reverse references. You can match Chapter 1 or Section 3 by adding parentheses to the appropriate position of the regular expression above.

The following regular expressions use parentheses to combine Chapter and Section so that the expression works correctly:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

Although these expressions work correctly, Chapter | T he brackets around Section also capture either of the two matching words for later use. Because there is only one set of parentheses in the expression above, there is only one "submatch" that is captured.

In the example above, you only need to use parentheses to combine the choices between the words Chapter and Section. T o prevent matches from being saved for future use, place ?:before the regular expression pattern in parentheses. The following modifications provide the same capabilities without saving child matches:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

In addition to the ?: Meta-characters, two other non-captured meta-characters create something called a "prediction first" match. F orward predictions are specified first by using ? , which matches the search string at the starting point in parentheses that match the regular expression pattern. R everse prediction is used first?! Specifies that it matches the search string at the starting point of a string that does not match the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. F urther, you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expressions, which are an example of forward prediction first, match Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

Once a match is found, the next match is searched immediately after the matching text (excluding the characters in the prediction first). For example, if the above expression matches Windows 98, the search will continue after Windows instead of after 98.

Other examples

Here are some examples of regular expressions:

正则表达式 描述
/\b([a-z]+) \1\b/gi 一个单词连续出现的位置。
/(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/ 将一个URL解析为协议、域、端口及相对路径。
/^(?:Chapter|Section) [1-9][0-9]{0,1}$/ 定位章节的位置。
/[-a-z]/ A至z共26个字母再加一个-号。
/ter\b/ 可匹配chapter,而不能匹配terminal。
/\Bapt/ 可匹配chapter,而不能匹配aptitude。
/Windows(?=95 |98 |NT )/ 可匹配Windows95或Windows98或WindowsNT,当找到一个匹配后,从Windows后面开始进行下一次的检索匹配。
/^\s*$/ 匹配空行。
/\d{2}-\d{5}/ 验证由两位数字、一个连字符再加 5 位数字组成的 ID 号。
/<\s*(\S+)(\s[^>]*)?>[\s\S]*<\s*\/\1\s*>/ 匹配 HTML 标记。