Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Regular Expressions - Match rules


May 28, 2021 Regular expression


Table of contents


Regular Expression - Matches the rule

The basic pattern matches

It all starts at the most basic. P atterns are the most basic elements of a formal expression, and they are a set of characters that describe the characteristics of a string. P atterns can be simple, consisting of ordinary strings, or they can be very complex, often using special characters to represent a range of characters, recurring, or representing context. For example:

^once

This pattern contains a special character, , which means that the pattern matches only those strings that begin with a once. F or example, the pattern matches the string "once upon a time" and the "There once was a man from New York". The $symbol is used to match strings that end in a given pattern, just as the symbol of the symbol indicates the beginning.

bucket$

This pattern matches "Who kept all of this cash in a bucket" and does not match "buckets". W hen used at the same time, the characters , and $, represent an exact match (strings are the same as patterns). For example:

^bucket$

Only the string "bucket" matches. I f a pattern does not include s and $, it matches any string that contains the pattern. For example: mode

once

with the string

There once was a man from NewYork
Who kept all of his cash in a bucket.

Is a match.

The letters (o-n-c-e) in this pattern are literal characters, that is, they represent the letter itself, and the numbers are the same. O ther slightly more complex characters, such as punctuation and white characters (spaces, tabs, etc.), use escape sequences. A ll escape sequences are headed with a backslash. T he escape sequence for tabs is: . So if we want to detect if a string starts with a tab, we can use this pattern:

^\t 

Similarly, a "new line" is used for a new line, and a return is used for a carriage return. O ther special symbols can be used to add a backslash at the front, such as the backslash itself with a full period. Expressed by . . . and so on.

The character cluster

In INTERNET programs, formal expressions are often used to validate the user's input. When a user submits a FORM, it is not enough to use plain literal characters to determine whether the phone number, address, EMAIL address, credit card number, etc. entered is valid.

So to use a more free way to describe the pattern we want, it's a character cluster. To create a character cluster that represents all phonetic characters, place all the u-tone characters in square brackets:

[AaEeIiOoUu]

This pattern matches any phonetic character, but can represent only one character. A hyphen can represent a range of characters, such as:

[a-z] //匹配所有的小写字母 
[A-Z] //匹配所有的大写字母 
[a-zA-Z] //匹配所有的字母 
[0-9] //匹配所有的数字 
[0-9\.\-] //匹配所有的数字,句号和减号 
[ \f\r\t\n] //匹配所有的白字符

Again, these represent only one character, which is very important. Use this pattern if you want to match a string of lowercase letters and a number, such as "z2," "t6," or "g7," but not "ab2," "r2d3," or "b52":

^[a-z][0-9]$

Although a-z stands for a range of 26 letters, here it can only match strings where the first character is a lowercase letter.

The beginning of the string was mentioned earlier, but it has another meaning. W hen a yes is used in a set of square brackets, it means "no" or "excluded" and is often used to exclude a character. Also using the previous example, we require that the first character cannot be a number:

^[^0-9][0-9]$

This pattern matches the "5", "g7" and "-2", but does not match the "12" and "66". Here are a few examples of excluding specific characters:

[^a-z] //除了小写字母以外的所有字符 
[^\\\/\^] //除了(\)(/)(^)之外的所有字符 
[^\"\'] //除了双引号(")和单引号(')之外的所有字符

Special character "." ( point, period) is used in regular expressions to represent all characters except "new lines". S o the pattern ".5$" matches any two-character string that ends with the number 5 and begins with other non-"new line" characters. M ode "." You can match any string except the empty string and a string that contains only a "new line".

PhP's formal expressions have some built-in generic character clusters, and the list is as follows:

字符簇 描述
[[:alpha:]] 任何字母
[[:digit:]] 任何数字
[[:alnum:]] 任何字母和数字
[[:space:]] 任何空白字符
[[:upper:]] 任何大写字母
[[:lower:]] 任何小写字母
[[:punct:]] 任何标点符号
[[:xdigit:]] 任何16进制的数字,相当于[0-9a-fA-F]

Make sure that the recurring occurs

By now, you already know how to match a letter or number, but more often than not, you might want to match a word or set of numbers. A word consists of several letters and a set of numbers consists of several sings. The braces that follow a character or cluster of characters are used to determine the number of recurrings of the preceding content.

字符簇 描述
^[a-zA-Z_]$ 所有的字母和下划线
^[[:alpha:]]{3}$ 所有的3个字母的单词
^a$ 字母a
^a{4}$ aaaa
^a{2,4}$ aa,aaa或aaaa
^a{1,3}$ a,aa或aaa
^a{2,}$ 包含多于两个a的字符串
^a{2,} 如:aardvark和aaab,但apple不行
a{2,} 如:baad和aaa,但Nantucket不行
\t{2} 两个制表符
.{2} 所有的两个字符

These examples describe three different uses of braces. A number, "x", means "the previous character or cluster of characters only appears x times", a number plus a comma, "the previous content appears x or more times", and two numbers separated by commas, "x, y", mean "the previous content appears at least x times, but not more than y times". We can extend the pattern to more words or numbers:

^[a-zA-Z0-9_]{1,}$ //所有包含一个以上的字母、数字或下划线的字符串 
^[0-9]{1,}$ //所有的正数 
^\-{0,1}[0-9]{1,}$ //所有的整数 
^\-{0,1}[0-9]{0,}\.{0,1}[0-9]{0,}$ //所有的小数

The last example is not very well understood, is it? L et's see it this way: with all the numbers that start with an optional negative sign (-0,1), followed by 0 or more numbers ('0-9', '0'), and an optional dosing point ('.' 0 , 1) and then keep up with 0 or more numbers ('0-9', '0'), and nothing else ($). Below you'll know the simpler approach you can use.

Special character ????" T hey are equal to . So the example just now can be reduced to:

^\-?[0-9]{0,}\.?[0-9]{0,}$

The special characters " are equal to "0," and they all represent "0 or more of the preceding content". Finally, the character "plus" is equal to the word "1," which means "1 or more preceding content," so the four examples above can be written as:

^[a-zA-Z0-9_]+$ //所有包含一个以上的字母、数字或下划线的字符串 
^[0-9]+$ //所有的正数 
^\-?[0-9]+$ //所有的整数 
^\-?[0-9]*\.?[0-9]*$ //所有的小数

Of course, this does not technically reduce the complexity of formal expressions, but it can make them easier to read.