Python3 正則表達式 re 模塊的使用 - 學習筆記

時間 2020-02-01

標籤 python3 python 正則表達式模塊使用學習筆記欄目 Python 简体版

原文原文鏈接

re 模塊的引入

Python 自1.5版本起增長了re模塊，它提供 Perl 風格的正則表達式模式。python

re模塊使 Python 語言擁有所有的正則表達式功能。正則表達式

re 模塊的使用

參數含義函數

pattern: 字符串形式的正則表達式

string: 要匹配的字符串

flags: 可選，表示匹配模式

pos:可選，字符串中開始搜索的位置索引

endpos:可選，endpos 限定了字符串搜索的結束

不填pos endpos默認掃描所有

re.compile()

compile(pattern, flags=0)spa

將正則表達式的樣式編譯爲一個正則表達式對象（正則對象）
能夠使用正則對象調用match()等函數

>>> test = '1 one 2 two 3 three'
>>> a=re.compile(r'\d+')
>>> b=a.match(test)
>>> print(f"輸出：{b[0]}")


輸出：1

re.match()與re.search()

re.match

re.match(pattern, string, flags=0)code

Pattern.match(string, pos, endpos)對象

若是 string 的 開始位置 可以找到這個正則樣式的任意個匹配，就返回一個相應的匹配對象。若是不匹配，就返回 None
能夠使用group(num) 或 groups() 匹配對象函數來獲取匹配表達式
- group(num=0) 表示匹配的整個表達式的字符串
- group() 能夠一次輸入多個組號，在這種狀況下它將返回一個包含那些組所對應值的元組。
- groups() 返回一個包含全部小組字符串的元組，從 1 到所含的小組號。

>>> test = '1 one 2 two 3 three'
>>> a=re.compile(r'(\d+) (\w+)')
>>> b=a.match(test)
>>> print(f"輸出：{b.group()}")
>>> print(f"輸出：{b.group(2)}")
>>> print(f"輸出：{b.group(1,2)}")
>>> print(f"輸出：{b.groups()}")


輸出：1 one
輸出：one
輸出：('1', 'one')
輸出：('1', 'one')

Match.start([group])和Match.end([group])
- 返回 group 匹配到的字串的開始和結束標號。
- 若是 group存在，但未產生匹配，就返回 -1 。
Match.span([group])
- 對於一個匹配 m ，返回一個二元組 (m.start(group), m.end(group))
- 注意若是 group 沒有在這個匹配中，就返回 (-1, -1)索引
  
  re.search()
  
  re.search(pattern, string, flags=0)three
  
  Pattern.search(string, pos, endpos)rem
掃描整個 string 尋找第一個匹配的位置，並返回一個相應的匹配對象。若是沒有匹配，就返回 None
其餘與match()一致

>>> test = 'one 2 two 3 three'
>>> a = re.compile(r'(\d+) (\w+)')
>>> b = a.search(test)
>>> c = a.match(test)
>>> print(c)
>>> print(f"輸出：{b.group()}")
>>> print(f"輸出：{b.group(2)}")
>>> print(f"輸出：{b.group(1,2)}")
>>> print(f"輸出：{b.groups()}")


輸出：None
輸出：2 two
輸出：two
輸出：('2', 'two')
輸出：('2', 'two')

區別

match()只匹配字符串的開始，若是字符串開始不符合正則表達式，則匹配失敗，函數返回 None
而search()匹配整個字符串，直到找到一個匹配

re.findall()與re.finditer()

re.findall()

re.findall(pattern, string, flags=0)字符串

Pattern.findall(string, pos, endpos)

對 string 返回一個不重複的 pattern 的匹配列表， string 從左到右進行掃描，匹配按找到的順序返回
若是樣式裏存在一到多個組，就返回一個組合列表；就是一個元組的列表（若是樣式裏有超過一個組合的話）

>>> test = 'one 2 two 3 three'
>>> a = re.compile(r'(\d+) (\w+)')
>>> b = a.search(test)
>>> b=a.findall(test)
>>> print(f"輸出：{b}")

輸出：[('2', 'two'), ('3', 'three')]

re.finditer()

re.finditer(pattern, string, flags=0)

Pattern.finditer(string, pos, endpos)

pattern 在 string 裏全部的非重複匹配，返回爲一個迭代器 iterator 保存了匹配對象

>>> test = 'one 2 two 3 three'
>>> a = re.compile(r'(\d+) (\w+)')
>>> b = a.finditer(test)
>>> print(f"輸出：{b}")
>>> for i in b:
        print(f"輸出：{i}")


輸出：<callable_iterator object at 0x036E7BD0>
輸出：<re.Match object; span=(4, 9), match='2 two'>
輸出：<re.Match object; span=(10, 17), match='3 three'>

區別

兩者最大的區別在於一個返回列表，一個返回迭代器

re.sub()與re.subn()

re.sub()

re.sub(pattern, repl, string, count=0, flags=0)

repl : 替換的字符串，也可爲一個函數。

count : 模式匹配後替換的最大次數，默認 0 表示替換全部的匹配。

最後返回替換結果

>>> test = '1 one 2 two 3 three'
>>> a=re.sub(r'(\d+)','xxx',test)
>>> print(f"輸出：{a}")
>>> print(f"輸出：{test}")


輸出：xxx one xxx two xxx three
輸出：1 one 2 two 3 three

re.subn()

re.subn(pattern, repl, string, count=0, flags=0)
參數含義同上

功能與re.subn相同，可是返回一個元組 (字符串, 替換次數)

>>> test = '1 one 2 two 3 three'
>>> a=re.subn(r'(\d+)','xxx',test)
>>> print(f"輸出：{a}")
>>> print(f"輸出：{test}")


輸出：('xxx one xxx two xxx three', 3)
輸出：1 one 2 two 3 three

re.split()

re.split(pattern, string, maxsplit=0, flags=0)

maxsplit：表示分割次數，默認爲0，表示無限制

用 pattern 分開 string
若是在 pattern 中捕獲到括號，那麼全部的組裏的文字也會包含在列表裏

>>> test = '1 one 2 two 3 three'
>>> a = re.split(r'\d+', test)
>>> b = re.split(r'(\d+)', test)
>>> print(f"輸出：{a}")
>>> print(f"輸出：{b}")


輸出：['', ' one ', ' two ', ' three']
輸出：['', '1', ' one ', '2', ' two ', '3', ' three']

正則表達式修飾符(匹配模式)

re.I    使匹配對大小寫不敏感
    re.L    作本地化識別匹配
    re.M    多行匹配，影響 ^ 和 $
            遇到\n視爲新的一行，從新匹配 ^ 和 $
    re.S    使 . 匹配包括換行在內的全部字符
    re.U    根據Unicode字符集解析字符。這個標誌影響 \w, \W, \b, \B.
    re.X    該標誌經過給予你更靈活的格式以便你將正則表達式寫得更易於理解。