Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Python-re regular getting started tutorial, the whole process is super detailed


May 31, 2021 Article blog


Table of contents


Hello, hello, our goal today is to understand what a regular expression is in 30 minutes, have some basic knowledge of it, and use it in your own program or web page. T his article the whole practice, as long as follow the practice, you will certainly have a harvest. Good lesson recommendation: Regular expression analysis;

First, single-character matching

1, match a string:

text = "abcdef"

ret = re.match('a',text)

print(ret.group())

2, point (.): match any character (except '''):

text = "\nabcdef"

ret = re.match('.',text)

print(ret.group())

3, sd: Match any number:

text = "abcdef"

ret = re.match('\d',text)

print(ret.group())

4, sd: Match any non-number:

text = "cabedf"

ret = re.match('\D',text)

print(ret.group())

5, s: matches the blank characters (including: sn, s, and spaces):

text = "\nabdef"

ret = re.match('\s',text)

print("*"*30)

print(ret.group())

print("*"*30)

6, s: Non-blank characters:

text = "ababdef"

ret = re.match('\S',text)

print("*"*30)

print(ret.group())

print("*"*30)

7, sw: matches a-z and A-Z as well as numbers and underscores:

text = "1bc"

ret = re.match('\w',text)

print("*"*30)

print(ret.group())

print("*"*30)

8,'W:Matching is the opposite of sw:

text = "+bc"

ret = re.match('\W',text)

print("*"*30)

print(ret.group())

print("*"*30)

9, the combination of the way, as long as the satisfaction of one of the brackets are considered matching success:

text = "cba"

ret = re.match('[1c]',text)

print("*"*30)

print(ret.group())

print("*"*30)

10, the use of a combination of ways . .

text = "abc"

Ret = Re.match ('[^ 0-9]', text) # ^ symbol indicates

print("="*30)

print(ret.group())

print("="*30)

11, the use of a combination of ways to achieve :

text = "+bc"

ret = re.match('[^a-zA-Z0-9_]',text)

print("="*30)

print(ret.group())

print("="*30)

Second, multi-character matching

1, x: Match 0 or more characters:

text = "-cba"

result = re.match('\D*',text)

print(result.group())

2, plus: Match 1 or more characters:

text = "1cba"

result = re.match('\w+',text)

print(result.group())

3, ?: Match the previous character 0 or 1:

text = "-cba"

result = re.match('\w?',text)

print(result.group())

4, sm: Match m characters:

text = "+1cba"

result = re.match('\w{2}',text)

print(result.group())

5, sm, n': Match the number of characters between m-n:

text = "1cba+"

result = re.match('\w{1,3}',text)

print(result.group())

Third, the regular expression case

1, verify the mobile phone number: the rule of mobile phone number is to start with 1, the second digit can be 34587, the latter 9 digits can be any number.

text = "17751632549"

result = re.match("1[34587]\d{9}",text)

print(result.group())

2, verify the mailbox: the rules of the mailbox is that the mailbox name is composed of numbers, English characters, underscores, and then the symbol, followed by the domain name.

text = "[email protected]"

result = re.match("\w+@[a-z0-9]+\.[a-z]+",text)

print(result.group())

3, verify the URL: The rule of the URL is that the front is http or https or ftp and then add a colon, plus two slashes, followed by any non-blank characters.

text = "https://www.w3cschool.cn/minicourse/play/quick_scrapy"

result = re.match("(http|https|ftp)://\S+",text)

print(result.group())

4, verify the ID card: the rule of ID card is that there are a total of 18 digits, the first 17 digits are numbers, the last one can be a number, can also be lowercase x, can also be capitalized X.

text = "35215669985213654x"

result = re.match("\d{17}[\dxX]",text)

print(result.group())

Four, start/end/greed and non-greed

1, s: to... beginning:

text = "hello world"

result = re.search("^world",text)

print(result.group())

2, $: To... end:

text = "hello world"

result = re.search("hello$",text)

print(result.group())

text = ""

result = re.search("^$",text)

print(result.group())

3, |: match multiple strings or expressions:

text = "https://www.w3cschool.cn/minicourse/play/quick_scrapy"

result = re.match("(http|https|ftp)://\S+",text)

print(result.group())

4, greed and non-greed:

text = "12345"

result = re.search("\d+?",text)

print(result.group())

Case 1: Extract the html label name:

text = "<h1>这是一级标题</h1>"

result = re.search("<.+?>",text)

print(result.group())

Case 2: Verify that a character is not a number between 0-100:

text = "100"

result = re.match("0$|[1-9]\d?$|100$",text)

print(result.group())

Fourth, escape characters and native strings

1, escape characters in Python:

text = r"hello\nw3cschool"

print(text)

2, escape characters in regular expressions:

text = "apple price is $9.9,range price is $8.8"

result = re.findall("\$\d+",text)

print(result)

3, native strings and regular expressions:

String resolution rules for regular expressions:

  • Let's parse this string at the Python language level.
  • The results of Python language-level parsing are then placed between regular expression layers for parsing.

text = "\cba c"

Result = Re.match ("\\\\ c", text) # \\\ c = (Python language level)> \\ c = (regular expression level)> \ C

Result = Re.match (r "\\ c", text) # \\ c = (regular expression level)> \ c

print(result.group())

Fifth, grouping

text = "apple price is $9.9,orange price is $8.8"

result = re.search('.+(\$\d+).+(\$\d+)',text)

print(result.groups())

Group () / group (0): Match the entire group

Group (1): Match the first group

Group (2): Match the second group

Groups (): Get all packets

Six, re commonly used functions

1, findall: find all the conditions that meet

text = "apple price is $9.9,orange price is $8.8"

result = re.findall(r'\$\d+',text)

print(result)

2, sub: Replace other strings according to the rules

text = "nihao zhongguo,hello w3cschool"

new_text = text.replace(" ","\n")

new_text = re.sub(r' |,','\n',text)

print(new_text)

html = """

<div class="job-detail">

<p>1. 3年以上相关开发经验 ,全日制统招本科以上学历</p>

<p>2. 精通一门或多门开发语言(Python,C,Java等),其中至少有一门有3年以上使用经验</p>

<p>3. 熟练使用ES/mysql/mongodb/redis等数据库;</p>

<p>4. 熟练使用django、tornado等web框架,具备独立开发 Python/Java 后端开发经验;</p>

<p>5. 熟悉 Linux / Unix 操作系统 </p>

<p>6. 熟悉 TCP/IP,http等网络协议</p>

<p>福利:</p>

<p>1、入职购买六险一金(一档医疗+公司全额购买商业险)+开门红+全额年终奖(1年13薪,一般会比一个月高)</p>

<p>2、入职满一年有2次调薪调级机会</p>

<p>3、项目稳定、团队稳定性高,团队氛围非常好(汇合员工占招行总员工比例接近50%);</p>

<p>4、有机会转为招商银行内部员工;</p>

<p>5、团队每月有自己的活动经费,法定节假日放假安排;</p>

<p>6、办公环境优良,加班有加班费(全额工资为计算基数,加班不超过晚上10点,平日加班为时薪1.5倍,周末加班为日薪2倍,周末加班也可优先选择调休,管理人性化)。</p>

</div>

"""

new_html = re.sub(r'<.+?>',"",html)

print(new_html)

3, split: split the string according to the rules

text = "nihao zhongguo,hello world"

result = re.split(r' |,',text)

print(result)

4, compile: compile regular expressions

text = "apple price is 34.56"

r = re.compile(r"""

\ D + # integer part

\.? # 小 数 点

\ D * # Embroof

""",re.VERBOSE)

result = re.search(r,text)

result = re.search(r"""

\ D + # integer part

\.? # 小 数 点

\ D * # Embroof

""",text,re.VERBOSE)

print(result.group())

If you want to comment in a regular expression, you need to add a 're. VERBOSE`。