Python中的正則表達式和示例

是時候javascript

關注css

咱們一波了java

模塊正則表達式（RE）指定與其匹配的一組字符串（模式）。
爲了理解RE的類比，MetaCharacter是有用的，重要的，而且將在模塊re的功能中使用。
一共有14個元字符，將在功能中進行討論：
python

\用來掉落字符的特殊含義 跟隨它（在下面討論）[]表明角色類別^匹配開頭$匹配結尾。匹配換行符之外的任何字符？匹配零個或一個匹配項。| 表示OR（與任何字符匹配 被它分開。*任意次數（包括0次）+一次或屢次{}指示先前RE的出現次數  匹配。（）附上一組RE

函數compile（）將
正則表達式編譯成模式對象，該對象具備用於各類操做的方法，例如搜索模式匹配或執行字符串替換。nginx

import re  #compile（）建立正則表達式字符類[a-e]，#至關於[abcde]。#類[abcde]將與具備「a」、「b」、「c」、「d」、「e」的字符串匹配。p = re.compile('[a-e]')  # findall（）搜索正則表達式，找到後返回一個列表print(p.findall("找到了"))

輸出：正則表達式

['e'，'a'，'d'，'b'，'e'，'a']

瞭解輸出：
第一次出現是「 Aye」中的「 e」，而不是「 A」，由於它區分大小寫。
下一個出現是「 said」中的「 a」，而後是「 said」中的「 d」，而後是「 Gibenson」中的「 b」和「 e」，最後一個「 a」與「 Stark」匹配。shell

元字符反斜槓「 \」具備很是重要的做用，由於它能夠發出各類序列的信號。若是要使用反斜槓而不使用其特殊含義做爲元字符，請使用'\\'json

\d匹配任何十進制數字，這等效 到設置的類別[0-9]。\D匹配任何非數字字符。\s匹配任何空格字符。\S匹配任何非空白字符\w匹配任何字母數字字符，這是 等效於類[a-zA-Z0-9_]。\W匹配任何非字母數字字符。

設置類[\ s ,.]將匹配任何空格字符「，」或「」..微信

import re  # \d至關於[0-9]。p = re.compile('\d') print(p.findall("我在2020年7月9日上午11時去關注軟件測試公衆號"))  # \d+ 將匹配[0-9]上的組，組大小爲一個或更大 p = re.compile('\d+') print(p.findall("我在2020年7月9日上午11時去關注軟件測試公衆號"))

輸出：函數

['2', '0', '2', '0', '7', '9', '1', '1']['2020', '7', '9', '11']

import re  # \w 至關於[a-zA-Z0-9]p = re.compile('\w') print(p.findall("Official account: software testing test."))  # \w+ 與字母數字字符組匹配。p = re.compile('\w+') print(p.findall("Official account: software testing test."))  # \W 與非字母數字字符匹配。p = re.compile('\W') print(p.findall("Official account: software testing test."))

輸出：

['O', 'f', 'f', 'i', 'c', 'i', 'a', 'l', 'a', 'c', 'c', 'o', 'u', 'n', 't', 's', 'o', 'f', 't', 'w', 'a', 'r', 'e', 't', 'e', 's', 't', 'i', 'n', 'g', 't', 'e', 's', 't']['Official', 'account', 'software', 'testing', 'test'][' ', ':', ' ', ' ', ' ', '.']

import re  # '*' 替換字符的出現次數。p = re.compile('ab*') print(p.findall("ababbaabbb"))

輸出：

['ab'，'abb'，'a'，'abbb']

瞭解輸出結果：

咱們的RE爲ab *，後接數字「 a」。'b'的值從0開始。
輸出'ab'是有效的，由於單一的'b'伴隨着單數'a'。輸出「 abb」有效，由於單數爲「 a」和2個爲「 b」。輸出「 a」有效，由於單數爲「 a」並伴有0「 b」。輸出「 abbb」有效，由於單數爲「 a」並伴有3個「 b」。

函數split（）
經過出現字符或模式來分割字符串，找到該模式後，字符串中的其他字符將做爲結果列表的一部分返回。
語法：

 re.split(pattern, string, maxsplit=0, flags=0)

第一個參數pattern表示正則表達式，string是將在其中搜索pattern並進行拆分的給定字符串，若是未提供maxsplit，則將其視爲零「 0」，若是提供任何非零值，則最多會發生許多分裂。若是maxsplit = 1，則字符串將僅拆分一次，從而產生一個長度爲2的列表。這些標誌很是有用，能夠幫助縮短代碼，它們不是必需的參數，例如：flags = re.IGNORECASE，在此拆分中，大小寫將被忽略。

from re import split  # '\W+' 非字母數字字符或字符組# 在找到「，」或空格「」時，split（）將從該點拆分字符串print(split('\W+', 'Software test, Software test, Software test')) print(split('\W+', "Software test"))  # 這裏的「：」、「」、「、」不是字母數字，所以是發生拆分的點print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))  # '\d+' 表示數字字符或字符組# 拆分僅在「12」、「2020」、「11」、「02」發生print(split('\d+', '2020年1月12日上午11:02'))

輸出：

['Software', 'test', 'Software', 'test', 'Software', 'test']['Software', 'test']['On', '12th', 'Jan', '2020', 'at', '11', '02', 'AM']['', '年', '月', '日上午', ':', '\u200b']

函數sub（）
語法：

re.sub（pattern，repl，string，count = 0，flags=0）

函數中的「 sub」表明SubString，在給定的字符串（第3個參數）中搜索某個正則表達式模式，並在找到子字符串模式後將其替換爲repl（第2個參數），計數檢查並保持次數發生這種狀況。

import re  # 正則表達式模式「te」匹配「testing」和「test」處的字符串。#因爲忽略了大小寫，所以使用標誌「te」應與字符串匹配兩次#匹配後，「testing」中的「te」替換爲「~*」，在「test」中，替換「te」。print(re.sub('te', '~*' , 'Coldrain has focused on software testing test', flags = re.IGNORECASE))  # 考慮到大小寫敏感度，「test」中的「te」將不會被從新調用。print(re.sub('te', '~*' , 'Coldrain has focused on software testing test'))  # 當最大值爲1時，替換次數爲1print(re.sub('te', '~*' , 'Coldrain has focused on software testing test', count=1, flags = re.IGNORECASE))

輸出：

Coldrain has focused on software ~*sting ~*stColdrain has focused on software ~*sting ~*stColdrain has focused on software ~*sting test

函數subn（）
語法：

 re.subn（pattern，repl，string，count = 0，flags= 0）

subn（）在全部方面都相似於sub（），除了提供輸出的方式外。它返回一個元組，其中包含替換和新字符串的總數，而不單單是字符串。

import re print(re.subn('te', '~*' , '雨寒已經關注了軟件測試test')) t = re.subn('te', '~*' , '雨寒已經關注了軟件測試test', flags = re.IGNORECASE) print(t) print(len(t))  # 這將產生與sub（）相同的輸出print(t[0])

輸出：

('雨寒已經關注了軟件測試~*st', 1)('雨寒已經關注了軟件測試test', 0)2雨寒已經關注了軟件測試test

函數escape（）
語法：

re.escape（字符串）

返回全部非字母數字都加反斜槓的字符串，若是要匹配其中可能包含正則表達式元字符的任意文字字符串，此方法頗有用。

import re # escape（）返回每一個非字母數字字符前帶有反斜槓「\」的字符串# 僅在第一種狀況下「」，不是字母數字# 在第二種狀況下，「，插入符號「^」、「-」、「[]」、「\」不是字母數字print(re.escape("I'm still writing at 1 a.m")) print(re.escape("I Asked what is this [a-9], he said \t ^WoW"))

輸出

I'm\ still\ writing\ at\ 1\ a\.mI\ Asked\ what\ is\ this\ \[a\-9\],\ he\ said\ \ \ \^WoW