May 31, 2021 Article blog
1. First, single-character matching
2. Second, multi-character matching
3. Third, the regular expression case
4. Four, start/end/greed and non-greed
Hello, hello, our goal today is to understand what a regular expression is in 30 minutes, have some basic knowledge of it, and use it in your own program or web page. T his article the whole practice, as long as follow the practice, you will certainly have a harvest. Good lesson recommendation: Regular expression analysis;
1, match a string:
text = "abcdef"
ret = re.match('a',text)
print(ret.group())
2, point (.): match any character (except '''):
text = "\nabcdef"
ret = re.match('.',text)
print(ret.group())
3, sd: Match any number:
text = "abcdef"
ret = re.match('\d',text)
print(ret.group())
4, sd: Match any non-number:
text = "cabedf"
ret = re.match('\D',text)
print(ret.group())
5, s: matches the blank characters (including: sn, s, and spaces):
text = "\nabdef"
ret = re.match('\s',text)
print("*"*30)
print(ret.group())
print("*"*30)
6, s: Non-blank characters:
text = "ababdef"
ret = re.match('\S',text)
print("*"*30)
print(ret.group())
print("*"*30)
7, sw: matches a-z and A-Z as well as numbers and underscores:
text = "1bc"
ret = re.match('\w',text)
print("*"*30)
print(ret.group())
print("*"*30)
8,'W:Matching is the opposite of sw:
text = "+bc"
ret = re.match('\W',text)
print("*"*30)
print(ret.group())
print("*"*30)
9, the combination of the way, as long as the satisfaction of one of the brackets are considered matching success:
text = "cba"
ret = re.match('[1c]',text)
print("*"*30)
print(ret.group())
print("*"*30)
10, the use of a combination of ways . .
text = "abc"
Ret = Re.match ('[^ 0-9]', text) # ^ symbol indicates
print("="*30)
print(ret.group())
print("="*30)
11, the use of a combination of ways to achieve :
text = "+bc"
ret = re.match('[^a-zA-Z0-9_]',text)
print("="*30)
print(ret.group())
print("="*30)
1, x: Match 0 or more characters:
text = "-cba"
result = re.match('\D*',text)
print(result.group())
2, plus: Match 1 or more characters:
text = "1cba"
result = re.match('\w+',text)
print(result.group())
3, ?: Match the previous character 0 or 1:
text = "-cba"
result = re.match('\w?',text)
print(result.group())
4, sm: Match m characters:
text = "+1cba"
result = re.match('\w{2}',text)
print(result.group())
5, sm, n': Match the number of characters between m-n:
text = "1cba+"
result = re.match('\w{1,3}',text)
print(result.group())
1, verify the mobile phone number: the rule of mobile phone number is to start with 1, the second digit can be 34587, the latter 9 digits can be any number.
text = "17751632549"
result = re.match("1[34587]\d{9}",text)
print(result.group())
2, verify the mailbox: the rules of the mailbox is that the mailbox name is composed of numbers, English characters, underscores, and then the symbol, followed by the domain name.
3, verify the URL: The rule of the URL is that the front is http or https or ftp and then add a colon, plus two slashes, followed by any non-blank characters.
text = "https://www.w3cschool.cn/minicourse/play/quick_scrapy"
result = re.match("(http|https|ftp)://\S+",text)
print(result.group())
4, verify the ID card: the rule of ID card is that there are a total of 18 digits, the first 17 digits are numbers, the last one can be a number, can also be lowercase x, can also be capitalized X.
text = "35215669985213654x"
result = re.match("\d{17}[\dxX]",text)
print(result.group())
1, s: to... beginning:
text = "hello world"
result = re.search("^world",text)
print(result.group())
2, $: To... end:
text = "hello world"
result = re.search("hello$",text)
print(result.group())
text = ""
result = re.search("^$",text)
print(result.group())
3, |: match multiple strings or expressions:
text = "https://www.w3cschool.cn/minicourse/play/quick_scrapy"
result = re.match("(http|https|ftp)://\S+",text)
print(result.group())
4, greed and non-greed:
text = "12345"
result = re.search("\d+?",text)
print(result.group())
Case 1: Extract the html label name:
text = "<h1>这是一级标题</h1>"
result = re.search("<.+?>",text)
print(result.group())
Case 2: Verify that a character is not a number between 0-100:
text = "100"
result = re.match("0$|[1-9]\d?$|100$",text)
print(result.group())
1, escape characters in Python:
text = r"hello\nw3cschool"
print(text)
2, escape characters in regular expressions:
text = "apple price is $9.9,range price is $8.8"
result = re.findall("\$\d+",text)
print(result)
3, native strings and regular expressions:
String resolution rules for regular expressions:
text = "\cba c"
Result = Re.match ("\\\\ c", text) # \\\ c = (Python language level)> \\ c = (regular expression level)> \ C
Result = Re.match (r "\\ c", text) # \\ c = (regular expression level)> \ c
print(result.group())
text = "apple price is $9.9,orange price is $8.8"
result = re.search('.+(\$\d+).+(\$\d+)',text)
print(result.groups())
Group () / group (0): Match the entire group
Group (1): Match the first group
Group (2): Match the second group
Groups (): Get all packets
1, findall: find all the conditions that meet
text = "apple price is $9.9,orange price is $8.8"
result = re.findall(r'\$\d+',text)
print(result)
2, sub: Replace other strings according to the rules
text = "nihao zhongguo,hello w3cschool"
new_text = text.replace(" ","\n")
new_text = re.sub(r' |,','\n',text)
print(new_text)
html = """
<div class="job-detail">
<p>1. 3年以上相关开发经验 ,全日制统招本科以上学历</p>
<p>2. 精通一门或多门开发语言(Python,C,Java等),其中至少有一门有3年以上使用经验</p>
<p>3. 熟练使用ES/mysql/mongodb/redis等数据库;</p>
<p>4. 熟练使用django、tornado等web框架,具备独立开发 Python/Java 后端开发经验;</p>
<p>5. 熟悉 Linux / Unix 操作系统 </p>
<p>6. 熟悉 TCP/IP,http等网络协议</p>
<p>福利:</p>
<p>1、入职购买六险一金(一档医疗+公司全额购买商业险)+开门红+全额年终奖(1年13薪,一般会比一个月高)</p>
<p>2、入职满一年有2次调薪调级机会</p>
<p>3、项目稳定、团队稳定性高,团队氛围非常好(汇合员工占招行总员工比例接近50%);</p>
<p>4、有机会转为招商银行内部员工;</p>
<p>5、团队每月有自己的活动经费,法定节假日放假安排;</p>
<p>6、办公环境优良,加班有加班费(全额工资为计算基数,加班不超过晚上10点,平日加班为时薪1.5倍,周末加班为日薪2倍,周末加班也可优先选择调休,管理人性化)。</p>
</div>
"""
new_html = re.sub(r'<.+?>',"",html)
print(new_html)
3, split: split the string according to the rules
text = "nihao zhongguo,hello world"
result = re.split(r' |,',text)
print(result)
4, compile: compile regular expressions
text = "apple price is 34.56"
r = re.compile(r"""
\ D + # integer part
\.? # 小 数 点
\ D * # Embroof
""",re.VERBOSE)
result = re.search(r,text)
result = re.search(r"""
\ D + # integer part
\.? # 小 数 点
\ D * # Embroof
""",text,re.VERBOSE)
print(result.group())
If you want to comment in a regular expression, you need to add a 're. VERBOSE`。