python 正則表達式

時間 2020-07-24

標籤 python 正則表達式欄目 Python 简体版

原文原文鏈接

re

python中re模塊提供了正則表達式相關操做python

字符：正則表達式

　　. 匹配除換行符之外的任意字符
　　\w 匹配字母或數字或下劃線或漢字
　　\s 匹配任意的空白符
　　\d 匹配數字
　　\b 匹配單詞的開始或結束
　　^ 匹配字符串的開始
　　$ 匹配字符串的結束ide

次數：this

　　* 重複零次或更屢次
　　+ 重複一次或更屢次
　　? 重複零次或一次
　　{n} 重複n次
　　{n,} 重複n次或更屢次
　　{n,m} 重複n到m次spa

matchcode

# match，從起始位置開始匹配，匹配成功返回一個對象，未匹配成功返回None

match(pattern, string, flags = 0 )

# pattern：正則模型

# string ：要匹配的字符串

# falgs ：匹配模式

X VERBOSE Ignore whitespace and comments for nicer looking RE's.

I IGNORECASE Perform case - insensitive matching.

M MULTILINE "^" matches the beginning of lines (after a newline)

as well as the string.

"$" matches the end of lines (before a newline) as well

as the end of the string.

S DOTALL "." matches any character at all , including the newline.

A ASCII For string patterns, make \w, \W, \b, \B, \d, \D

match the corresponding ASCII character categories

(rather than the whole Unicode categories, which is the

default).

For bytes patterns, this flag is the only available

behaviour and needn't be specified.

L LOCALE Make \w, \W, \b, \B, dependent on the current locale.

U UNICODE For compatibility only. Ignored for string patterns (it

is the default), and forbidden for bytes patterns.

        # 無分組
        r = re.match("h\w+", origin)        print(r.group())     # 獲取匹配到的全部結果
        print(r.groups())    # 獲取模型中匹配到的分組結果
        print(r.groupdict()) # 獲取模型中匹配到的分組結果

        # 有分組

        # 爲什麼要有分組？提取匹配成功的指定內容（先匹配成功所有正則，再匹配成功的局部內容提取出來）
        r = re.match("h(\w+).*(?P<name>\d)$", origin)        print(r.group())     # 獲取匹配到的全部結果
        print(r.groups())    # 獲取模型中匹配到的分組結果
        print(r.groupdict()) # 獲取模型中匹配到的分組中全部執行了key的組

searchorm

1 2	`# search,瀏覽整個字符串去匹配第一個，未匹配成功返回None` `# search(pattern, string, flags=0)`

        # 無分組
        r = re.search("a\w+", origin)        print(r.group())     # 獲取匹配到的全部結果
        print(r.groups())    # 獲取模型中匹配到的分組結果
        print(r.groupdict()) # 獲取模型中匹配到的分組結果

        # 有分組
        r = re.search("a(\w+).*(?P<name>\d)$", origin)        print(r.group())     # 獲取匹配到的全部結果
        print(r.groups())    # 獲取模型中匹配到的分組結果
        print(r.groupdict()) # 獲取模型中匹配到的分組中全部執行了key的組

findall對象

# findall，獲取非重複的匹配列表；若是有一個組則以列表形式返回，且每個匹配均是字符串；若是模型中有多個組，則以列表形式返回，且每個匹配均是元祖；

# 空的匹配也會包含在結果中

#findall(pattern, string, flags=0)

        # 無分組
        r = re.findall("a\w+",origin)        print(r)        # 有分組
        origin = "hello alex bcd abcd lge acd 19"
        r = re.findall("a((\w*)c)(d)", origin)        print(r)

subblog

# sub，替換匹配成功的指定位置字符串

sub(pattern, repl, string, count = 0 , flags = 0 )

# pattern：正則模型

# repl ：要替換的字符串或可執行對象

# string ：要匹配的字符串

# count ：指定匹配個數

# flags ：匹配模式

        # 與分組無關
        origin = "hello alex bcd alex lge alex acd 19"
        r = re.sub("a\w+", "999", origin, 2)        print(r)

splitci

# split，根據正則匹配分割字符串

split(pattern, string, maxsplit = 0 , flags = 0 )

# pattern：正則模型

# string ：要匹配的字符串

# maxsplit：指定分割個數

# flags ：匹配模式

        # 無分組
        origin = "hello alex bcd alex lge alex acd 19"
        r = re.split("alex", origin, 1)        print(r)        # 有分組        
        origin = "hello alex bcd alex lge alex acd 19"
        r1 = re.split("(alex)", origin, 1)        print(r1)
        r2 = re.split("(al(ex))", origin, 1)        print(r2)

IP：^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$
手機號：^1[3|4|5|8][0-9]\d{8}$
郵箱：
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+