Perl regular expression
Regular expressions describe a pattern of string matching that can be used to check whether a string contains some kind of subchain, replace a matching subchain, or remove a string from a string that meets a certain criteria.
Perl's regular expressions are very powerful and are basically the most powerful in common languages, and many languages refer to Perl's regular expressions when designing regular support.
The three forms of Perl's regular expressions are matching, substitution, and transformation:
-
Match: m / / (can also be short as //, omit m)
-
Replacement: s///
-
Conversion: tr///
These three forms are =~ generally !~ used in combination with the .
Match the operator
Match operator m// is used to match a string statement or a regular expression, for example, to match the "w3c" in the standard $bar, as follows:
#!/usr/bin/perl $bar = "I am w3cschool site. welcome to w3cschool site."; if ($bar =~ /w3c/){ print "第一次匹配\n"; }else{ print "第一次不匹配\n"; } $bar = "w3c"; if ($bar =~ /w3c/){ print "第二次匹配\n"; }else{ print "第二次不匹配\n"; }
The above procedure is performed and the output is:
第一次匹配 第二次匹配
The pattern matches the modifier
Pattern matching has some common modifiers, as shown in the following table:
Modifier | Describe |
---|---|
Ⅰ | Ignore case in mode |
M | Multi-line mode |
O | Assigned only once |
s | One-line mode, "." Match ''n') (default mismatch) |
Ⅹ | Ignore the white space in the pattern |
G | Global match |
Cg | After the global match fails, the match string is allowed to be found again |
Regular expression variables
After perl processing, there are three special variable names for the matched value:
- $`: Match the previous part of the string
- $&: The matching string
- $': There are no matching remaining strings
If you put the three variables together, you get the original string.
Here's an example:
#!/usr/bin/perl $string = "welcome to w3cschool site."; $string =~ m/w3c/; print "匹配前的字符串: $`\n"; print "匹配的字符串: $&\n"; print "匹配后的字符串: $'\n";
The output of the above program is:
匹配前的字符串: welcome to 匹配的字符串: w3c 匹配后的字符串: school site.
Replace the operator
The replacement operator s/// is an extension that matches the operator, replacing the specified string with a new string. The basic format is as follows:
s/PATTERN/REPLACEMENT/;
PATTERN is the matching pattern and REPLACEMENT is the replacement string.
For example, we replace the following string with "google" with "w3cschool":
#!/usr/bin/perl $string = "welcome to google site."; $string =~ s/google/w3cschool/; print "$string\n";
The output of the above program is:
welcome to w3cschool site.
Replace the action modifier
The replacement action modifier is shown in the following table:
Modifier | Describe |
---|---|
Ⅰ | If you add "i" to the modifier, case sensitivity is canceled, i.e. "a" and "A" are the same. |
M | The default positive start and end "$" is just for the positive string If you add "m" to the modifier, the beginning and end will refer to each line of the string: each line begins with " and ends with "$". |
O | The expression is executed only once. |
s | If you add "s" to the modifier, the default "." Represents that any character other than line breaks will become arbitrary, that is, include line breaks! |
Ⅹ | If you add the modifier, the blank character in the expression is ignored unless it has been escaped. |
G | Replace all matching strings. |
E | Replace the string as an expression |
The conversion operator
Here are the modifiers associated with the conversion operator:
Modifier | Describe |
---|---|
C | Convert all unsealed characters |
D | Remove all specified characters |
s | Shrink multiple identical output characters into one |
The following example converts $string lowercase letters in a variable into capital letters:
#!/usr/bin/perl $string = 'welcome to w3cschool site.'; $string =~ tr/a-z/A-z/; print "$string\n";
The output of the above program is:
WELCOME TO W3CSCHOOL SITE.
The following instance uses /s to remove $string characters from the variable:
#!/usr/bin/perl $string = 'w3cschool'; $string =~ tr/a-z/a-z/s; print "$string\n";
The output of the above program is:
w3cschol
More examples:
$string =~ tr/\d/ /c; # 把所有非数字字符替换为空格 $string =~ tr/\t //d; # 删除tab和空格 $string =~ tr/0-9/ /cs # 把数字间的其它字符替换为一个空格。
More regular expression rules
The expression | Describe |
---|---|
. | Matches all characters except line breaks |
Ⅹ? | Match the x string 0 times or once |
x* | Match the x string 0 times or more times, but match the least number possible |
x+ | Match the x string 1 or more times, but match the least number possible |
.* | Matches any character 0 or more times |
.+ | Matches any character 1 or more times |
{m} | The match happens to be the specified string of m |
{m,n} | Matches the specified string below m and n or less |
{m,} | Matches the specified string of more than m |
[] | Matches characters that match within |
[^] | Matches characters that do not fit within |
[0-9] | Matches all numeric characters |
[a-z] | Matches all lowercase letter characters |
[^0-9] | Matches all non-numeric characters |
[^a-z] | Matches all non-lowercase characters |
^ | Matches the character at the beginning of the character |
$ | Matches the character at the end of the character |
\d | The character that matches a number is the same as the syntax of 0-9 |
\d+ | Matches multiple numeric strings, the same syntax as the syntax of 0-9 |
\D | Non-numbers, others with the same |
\D+ | Non-numbers, others with the same |
\w | A string of Letters or Numbers, the same as the syntax of "a-zA-Z0-9" |
\w+ | It's the same syntax as the syntax of the a-zA-Z0-9 |
\W | A string of non-English letters or numbers, the same syntax as the syntax of the syntax of the word "a-zA-Z0-9" |
\W+ | It's the same syntax as the syntax of the s.a-zA-Z0-9 |
\s | Spaces, the same syntax as the syntax |
\s+ | It's the same as |
\S | Non-spaces, the same syntax as the syntax |
\S+ | It's the same syntax as the syntax |
\b | Matches strings with English letters and numbers as boundaries |
\B | Matches strings that are not bounded by English letters and numeric values |
a|b|c | Matches a string that matches either the a character or the b character or the c character |
Abc | Matching a string with abc (pattern) () this symbol remembers the string you're looking for and is a useful syntax. The string found in the first () becomes $1, the variable, or the second (), and the string found inside becomes $2, or 2, and so on. |
/pattern/i | i This parameter means that English case is ignored, i.e. the case of English is not taken into account when matching strings. I f you are looking for a special character in pattern mode, such as ""," you'll want to add a sign before that character to invalidate the special character |