Python3中正則模塊re.compile、re.match及re.search函數用法詳解

時間 2019-11-14

標籤 python3 python 正則模塊 re.compile compile re.match match re.search search 函數用法詳解欄目 Python 简体版

原文原文鏈接

Python3中正則模塊re.compile、re.match及re.search函數用法

re模塊 re.compile、re.match、 re.search

正則匹配的時候，第一個字符是 r，表示 raw string 原生字符，意在聲明字符串中間的特殊字符不用轉義。javascript

好比表示 ‘\n'，能夠寫 r'\n'，或者不適用原生字符 ‘\n'。java

推薦使用 re.matchpython

re.compile() 函數正則表達式

編譯正則表達式模式，返回一個對象。能夠把經常使用的正則表達式編譯成正則表達式對象，方便後續調用及提升效率。算法

re.compile(pattern, flags=0)

pattern 指定編譯時的表達式字符串
flags 編譯標誌位，用來修改正則表達式的匹配方式。支持 re.L|re.M 同時匹配

flags 標誌位參數數據結構

re.I(re.IGNORECASE)
使匹配對大小寫不敏感函數

re.L(re.LOCAL)
作本地化識別（locale-aware）匹配工具

re.M(re.MULTILINE)
多行匹配，影響 ^ 和 $測試

re.S(re.DOTALL)
使 . 匹配包括換行在內的全部字符spa

re.U(re.UNICODE)
根據Unicode字符集解析字符。這個標誌影響 \w, \W, \b, \B.

re.X(re.VERBOSE)
該標誌經過給予你更靈活的格式以便你將正則表達式寫得更易於理解。

示例：

1 import re 2 content = 'Citizen wang , always fall in love with neighbour，WANG'
3 rr = re.compile(r'wan\w', re.I) # 不區分大小寫
4 print(type(rr)) 5 a = rr.findall(content) 6 print(type(a)) 7 print(a)

findall 返回的是一個 list 對象

<class '_sre.SRE_Pattern'>
<class 'list'>
['wang', 'WANG']

re.match() 函數

老是從字符串‘開頭曲匹配'，並返回匹配的字符串的 match 對象 <class '_sre.SRE_Match'>。

re.match(pattern, string[, flags=0])

pattern 匹配模式，由 re.compile 得到
string 須要匹配的字符串

 1 import re
 2 pattern = re.compile(r'hello')
 3 a = re.match(pattern, 'hello world')
 4 b = re.match(pattern, 'world hello')
 5 c = re.match(pattern, 'hell')
 6 d = re.match(pattern, 'hello ')
 7 if a:
 8   print(a.group())
 9 else:
10   print('a 失敗')
11 if b:
12   print(b.group())
13 else:
14   print('b 失敗')
15 if c:
16   print(c.group())
17 else:
18   print('c 失敗')
19 if d:
20   print(d.group())
21 else:
22   print('d 失敗')

運行結果：

hello
b 失敗
c 失敗
hello

match 的方法和屬性

參考連接

 1 import re
 2 str = 'hello world! hello python'
 3 pattern = re.compile(r'(?P<first>hell\w)(?P<symbol>\s)(?P<last>.*ld!)') # 分組，0 組是整個 hello world!, 1組 hello，2組 ld!
 4 match = re.match(pattern, str)
 5 print('group 0:', match.group(0)) # 匹配 0 組，整個字符串
 6 print('group 1:', match.group(1)) # 匹配第一組，hello
 7 print('group 2:', match.group(2)) # 匹配第二組，空格
 8 print('group 3:', match.group(3)) # 匹配第三組，ld!
 9 print('groups:', match.groups())  # groups 方法，返回一個包含全部分組匹配的元組
10 print('start 0:', match.start(0), 'end 0:', match.end(0)) # 整個匹配開始和結束的索引值
11 print('start 1:', match.start(1), 'end 1:', match.end(1)) # 第一組開始和結束的索引值
12 print('start 2:', match.start(1), 'end 2:', match.end(2)) # 第二組開始和結束的索引值
13 print('pos 開始於：', match.pos)
14 print('endpos 結束於：', match.endpos) # string 的長度
15 print('lastgroup 最後一個被捕獲的分組的名字：', match.lastgroup)
16 print('lastindex 最後一個分組在文本中的索引：', match.lastindex)
17 print('string 匹配時候使用的文本：', match.string)
18 print('re 匹配時候使用的 Pattern 對象：', match.re)
19 print('span 返回分組匹配的 index （start(group),end(group))：', match.span(2))

運行結果：

 1 group 0: hello world!
 2 group 1: hello
 3 group 2:  
 4 group 3: world!
 5 groups: ('hello', ' ', 'world!')
 6 start 0: 0 end 0: 12
 7 start 1: 0 end 1: 5
 8 start 2: 0 end 2: 6
 9 pos 開始於： 0
10 endpos 結束於： 25
11 lastgroup 最後一個被捕獲的分組的名字： last
12 lastindex 最後一個分組在文本中的索引： 3
13 string 匹配時候使用的文本： hello world! hello python
14 re 匹配時候使用的 Pattern 對象： re.compile('(?P<first>hell\\w)(?P<symbol>\\s)(?P<last>.*ld!)')
15 span 返回分組匹配的 index （start(group),end(group))： (5, 6)

re.search 函數

對整個字符串進行搜索匹配，返回第一個匹配的字符串的 match 對象。

re.search(pattern, string[, flags=0])

pattern 匹配模式，由 re.compile 得到

string 須要匹配的字符串

 1 import re
 2 str = 'say hello world! hello python'
 3 pattern = re.compile(r'(?P<first>hell\w)(?P<symbol>\s)(?P<last>.*ld!)') # 分組，0 組是整個 hello world!, 1組 hello，2組 ld!
 4 search = re.search(pattern, str)
 5 print('group 0:', search.group(0)) # 匹配 0 組，整個字符串
 6 print('group 1:', search.group(1)) # 匹配第一組，hello
 7 print('group 2:', search.group(2)) # 匹配第二組，空格
 8 print('group 3:', search.group(3)) # 匹配第三組，ld!
 9 print('groups:', search.groups())  # groups 方法，返回一個包含全部分組匹配的元組
10 print('start 0:', search.start(0), 'end 0:', search.end(0)) # 整個匹配開始和結束的索引值
11 print('start 1:', search.start(1), 'end 1:', search.end(1)) # 第一組開始和結束的索引值
12 print('start 2:', search.start(1), 'end 2:', search.end(2)) # 第二組開始和結束的索引值
13 print('pos 開始於：', search.pos)
14 print('endpos 結束於：', search.endpos) # string 的長度
15 print('lastgroup 最後一個被捕獲的分組的名字：', search.lastgroup)
16 print('lastindex 最後一個分組在文本中的索引：', search.lastindex)
17 print('string 匹配時候使用的文本：', search.string)
18 print('re 匹配時候使用的 Pattern 對象：', search.re)
19 print('span 返回分組匹配的 index （start(group),end(group))：', search.span(2))

注意 re.search 和 re.match 匹配的 str 的區別

運行結果：

 1 group 0: hello world!
 2 group 1: hello
 3 group 2:  
 4 group 3: world!
 5 groups: ('hello', ' ', 'world!')
 6 start 0: 4 end 0: 16
 7 start 1: 4 end 1: 9
 8 start 2: 4 end 2: 10
 9 pos 開始於： 0
10 endpos 結束於： 29
11 lastgroup 最後一個被捕獲的分組的名字： last
12 lastindex 最後一個分組在文本中的索引： 3
13 string 匹配時候使用的文本： say hello world! hello python
14 re 匹配時候使用的 Pattern 對象： re.compile('(?P<first>hell\\w)(?P<symbol>\\s)(?P<last>.*ld!)')
15 span 返回分組匹配的 index （start(group),end(group))： (9, 10)