Python學習手冊之捕獲組和特殊匹配字符串

時間 2020-05-06

原文原文鏈接

在上一篇文章中，咱們介紹了 Python 的字符類和對元字符進行了深刻講解，如今咱們介紹 Python 的捕獲組和特殊匹配字符串。查看上一篇文章請點擊：https://www.cnblogs.com/dustman/p/10036661.htmlhtml

捕獲組
能夠經過用括號包圍正則表達式的部分來建立組，意味着一個組能夠做爲元字符 (例如 * 和 ?) 的參數。java

import re pattern = r"python(ice)*" string1 = "python!" string2 = "ice" string3 = "pythonice" match1 = re.match(pattern,string1) match2 = re.match(pattern,string2) match3 = re.match(pattern,string3) if match1: print(match1.group()) print("match 1") if match2: print(match2.group()) print("match 2") if match3: print(match3.group()) print("match 3")

運行結果：python

>>> python match 1 pythonice match 3 >>>

上面的例子 (ice) 表示捕獲組。

以前介紹元字符和字符類時，咱們都用到了 group 函數訪問捕獲組中的內容。group(0) 或 group() 返回所有匹配，group(n) 調用 n 大於 0 返回第 n 組匹配。groups() 返回一個包含全部捕獲組的元組。mysql

import re pattern = r"j(av)(ap)(yt(h)o)n" string = "javapythonhtmlmysql" match = re.match(pattern,string) if match: print(match.group()) print(match.group(0)) print(match.group(1)) print(match.group(2)) print(match.groups())

運行結果：正則表達式

>>> javapython javapython av ap ('av', 'ap', 'ytho', 'h') >>>

捕獲組同時能夠嵌套，也就是說一個組能夠是另外一個組的子集。

有一些特殊的捕獲組，它們叫非捕獲組和命名捕獲組。
命名捕獲組的格式是 (?p<name>...)，其中 name 是組的名稱，...是要匹配的表達式。它們的行爲與正常組徹底相同，除了能夠經過索引訪問還能夠經過 group(name) 方式訪問它們。
非捕獲組的格式是 (?:...)。非捕獲組值匹配結果，但不捕獲結果，也不會分配組號，固然也不能在表達式和程序中作進一步處理。sql

import re pattern = r"(?P<python>123)(?:456)(789)" string = "123456789" match = re.match(pattern,string) if match: print(match.group("python")) print(match.groups())

運行結果：函數

>>> 123 ('123', '789') >>>

或匹配的元字符 |，red|blue 表示匹配 red 或者 blue。spa

import re string1 = "python" string2 = "pyihon" string3 = "pylhon" pattern = r"py(t|i)hon" match1 = re.match(pattern,string1) match2 = re.match(pattern,string2) match3 = re.match(pattern,string3) if match1: print(match1.group()) print("match 1") if match2: print(match2.group()) print("match 2") if match3: print(match3.group()) print("match 3")

運行結果：code

>>> python match 1 pyihon match 2 >>>

特殊匹配字符串
特殊序列
在正則表達式中可使用各類的捕獲組序列。它們被寫成反斜槓，後面跟着另外一個數字字符。
特殊序列是一個反斜槓和一個介於 1 到 99 之間的數字，好比：\1。數字自發表示捕獲組的序列，也就是說咱們能夠在正則表達式裏引用先前的捕獲組。htm

import re string1 = "html python" string2 = "python python" string3 = "java java" pattern = r"(.+) \1" match1 = re.match(pattern,string1) match2 = re.match(pattern,string2) match3 = re.match(pattern,string3) if match1: print(match1.group()) print("match 1") if match2: print(match2.group()) print("match 2") if match3: print(match3.group()) print("match 3")

運行結果：

>>> python python match 2 java java match 3 >>>

注意：(.+) \1 不等同於 (.+)(.+)，由於 \1 引用第一組的表達式，即匹配表達式自己，而不是正則匹配模式。

正則中還有一些特殊的匹配模式 \d, \s, 和 \w, 它們匹配數字，空白和單詞字符。在 ASCII 模式里正則裏等同 [0-9], [ \t\n\r\v] 和 [a-zA-Z0-9], 可是在 Unicode 模式裏 \w 匹配一個字。
若是咱們把這幾個字母變成大寫 \D, \S, 和 \W, 那麼意味着匹配模式相反。好比: \D 匹配非數字。

import re string1 = "python 2017!" string2 = "1,00,867!" string3 = "!@#?" pattern = r"(\D+\d)" match1 = re.match(pattern,string1) match2 = re.match(pattern,string2) match3 = re.match(pattern,string3) if match1: print(match1.group()) print("match 1") if match2: print(match2.group()) print("match 2") if match3: print(match3.group()) print("match 3")

運行結果：

>>> python 2 match 1 >>>

(\D+\d) 意味着匹配一個或者多個非數字後面跟隨一個數字。

特殊匹配
還有一些特殊的匹配表達式 \A, \Z, 和 \b。\A 僅匹配字符串的開始，在大多數條件下，它的做用等同於在模式中使用 ^。 \Z 僅匹配字符串的結束，在大多數狀況下，相等於 $。
\b 匹配一個詞的邊界。一個詞的邊界就是一個詞不被另一個詞跟隨的位置或者不是另外一個詞彙字符前邊的位置。至關於\w 和 \W 之間有個一個空字符串。
\B 匹配一個非單詞邊界。它匹配一個先後字符都是相同類型的位置：都是單詞或者都不是單詞。一個字符串的開始和結尾都被認爲是非單詞。

import re string1 = "The dog eat!" string2 = "<dog>dog<>?" string3 = "dogeatpython" pattern = r"\b(dog)\b" search1 = re.search(pattern,string1) search2 = re.search(pattern,string2) search3 = re.search(pattern,string3) if search1: print(search1.group()) print("search 1") if search2: print(search2.group()) print("search 2") if search3: print(search3.group()) print("search 3")