python正則表達式

時間 2019-11-08

原文原文鏈接

正則表達式首先調用re模塊正則表達式

import re

一、match方法函數

　　match(pattern, string, flags=0) #re.match的函數簽名spa

match要求從開頭匹配

a = re.match('test',  'test123')
print(a)

b = re.match('test', 'ltest1234')
print(b)
#b的結果爲None，說明沒有匹配到

二、search方法調試

search（pattern, string, flags=0）code

search不須要從第一個字符開始匹配blog

a = re.search('test', 'test1234')
print(a)

b = re.search('test', 'lleatest4323')
print(b)

不管a和b都是能夠匹配到的string

三、元字符metacharactersit

　　. 表示匹配除換行符外的任意字符編譯

　　\w 表示匹配字母或數字或下劃線或漢字class

\s 表示匹配任意空白字符，包括空格、製表符、換頁付等

　　\d 表示數字

　　\b 匹配單詞開始或結束

print(re.search('.....','hello'))
print(re.search('\w\w\w\w', 'a1_啥'))
print(re.search('\s\s', '\t\r'))
print(re.search('\d\d', '12'))

　　^匹配行首

$匹配行尾

print(re.search('^h.*\w$','hello'))

　　x|y 匹配x或者y

　　[xyz] 匹配任意一個字符

　　[a-z]匹配字符範圍，也是匹配任意一個字符

print(re.search('a|e', 'abble'))
print(re.search('[a12]','abcd'))

四、重複

　　?匹配前面的字表達式零次或一次

　　+匹配前面字表達式一次或者屢次

　　*匹配前面子表達式零次或者屢次

　　{n}重複n次

　　{n,}最少重複n次

　　{，m}最多重複m次

print(re.search('\d{5}', '12345'))
print(re.search('ca*t', 'cart'))
print(re.search('ca*t', 'cat'))
print(re.search('ca*t', 'caat'))

五、反義

[^x] 匹配除了x之外的任意字符

[^abc] 匹配除了abc這幾個字母之外的任意字符

\W 匹配任意不是字母、數字、下劃線、漢字的字符等價於[^A-Za-z0-9_]

\S 匹配任意不是空白的字符等價於[^\f\n\r\t\v]

\D 匹配任意非數字的字符 [^0-9]

\B 匹配不是單詞開頭或者結束的位置

六、貪婪與懶惰

默認狀況下正則表達式是貪婪模式

print(re.search('a.*b', 'aabab').group())
#aabab

*? 重複任意次，但儘量少重複

+？重複一次或屢次，可是儘量少重複

？？重複0次或1次，但儘量少重複

{n,m}? 儘量少重複 n到m次

print(re.search('a.*?b', 'abadbadb').group())
#ab

七、編譯標誌

DOTALL, S 使 .匹配包括換行在內的全部字符

IGNORECASE, I 使匹配對大小寫不敏感

LOCALE L 使本地化識別匹配

MULTILINE， M 多行匹配影響^ $

VERBOSE, X 詳細狀態

DEBUG 調試模式

print(re.search('.', '\n'))
#None

print(re.search('.', '\n', re.S))

print(re.search('a.', 'A\n', re.S|re.I))

八、編譯正則表達式

regex = re.compile(r'^\d{1,3}$')

print(regex.match('12'))
print(regex.match('1234'))

九、檢索替換

re.sub(pattern, repl, string, count=0, flags=0)

print(re.sub('\d+', '', 'test123'))
print(re.sub('\d', '', 'test123test13451tesa'))
#輸出同樣，說明替換不是隻替換一次，而是貪婪模式，所有替換

print(re.sub('\d', '', 'test123', count=2))

十、findall/finditer

　　findall會一次返回全部匹配的數值並放入列表，finditer會返回一個迭代器

print(re.findall('\d', '1a2b3c4d5e6'))

for i in re.finditer('\d', '1a2d3e4r5ft6rq'):
    print(i)

十一、分組

m = re.compile(r'(a)b')
a = m.match('ab')
print(a.group(1))

m = re.compile(r'([a-c]+).*(\w)')
a = m.match('abcbde')
print(a.group(1), a.group(2), a.group(1, 2))

十二、命名分組

(?P<name>正則表達式) #命名分組格式

pattern = '(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'

m = re.match(pattern, '2018-01-02')
print(m.groupdict())

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。