淺析python正則表達式一：基本概念

時間 2019-11-10

標籤淺析 python 正則表達式基本概念欄目 Python 简体版

原文原文鏈接

該文來自個人我的博客, http://zpeng.gitcafe.io/2016/01/19/python/re-python/, 轉載請註明，謝謝。python

前言

斷斷續續花了一個多月把，hackerrank上關於python正則部分的題刷完了，對python正則表達式的認識達到了新的高度。寫下博文記錄一些python正則的技能點。git

python正則

概述

在ipython中輸入help(re)，查看正則模塊re的幫組信息。正則表達式

特殊字符

python正則中的特殊字符以下：express

The special characters are:
    "."      Matches any character except a newline.
    "^"      Matches the start of the string.
    "$"      Matches the end of the string or just before the newline at
             the end of the string.
    "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
             Greedy means that it will match as many repetitions as possible.
    "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
    "?"      Matches 0 or 1 (greedy) of the preceding RE.
    *?,+?,?? Non-greedy versions of the previous three special characters.
    {m,n}    Matches from m to n repetitions of the preceding RE.
    {m,n}?   Non-greedy version of the above.
    "\\"     Either escapes special characters or signals a special sequence.
    []       Indicates a set of characters.
             A "^" as the first character indicates a complementing set.
    "|"      A|B, creates an RE that will match either A or B.
    (...)    Matches the RE inside the parentheses.
             The contents can be retrieved or matched later in the string.
    (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below).
    (?:...)  Non-grouping version of regular parentheses.
    (?P<name>...) The substring matched by the group is accessible by name.
    (?P=name)     Matches the text matched earlier by the group named name.
    (?#...)  A comment; ignored.
    (?=...)  Matches if ... matches next, but doesn't consume the string.
    (?!...)  Matches if ... doesn't match next.
    (?<=...) Matches if preceded by ... (must be fixed length).
    (?<!...) Matches if not preceded by ... (must be fixed length).
    (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
                       the (optional) no pattern otherwise.

總結以下：ide

符號	含義
.	匹配除了換行符的任意字符
^	字符串開頭或用在[]中表示不匹配某些字符
$	匹配模式的末尾
*	匹配任意多的字符
+	至少匹配一個字符
?	匹配０或一個字符
{m,n}	表示重複前面模式ｍ到ｎ次，可表達爲{m},{m,},{m,n}
[]	字符集合
()	子模式
｜	模式或

？字符在正則表達式中功能特別豐富，看淺析python正則表達式二：用好問號。函數

集合簡寫

查看re幫助文檔，以下：學習

\number  Matches the contents of the group of the same number.
\A       Matches only at the start of the string.
\Z       Matches only at the end of the string.
\b       Matches the empty string, but only at the start or end of a word.
\B       Matches the empty string, but not at the start or end of a word.
\d       Matches any decimal digit; equivalent to the set [0-9].
\D       Matches any non-digit character; equivalent to the set [^0-9].
\s       Matches any whitespace character; equivalent to [ \t\n\r\f\v].
\S       Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v].
\w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
         With LOCALE, it will match the set [0-9_] plus characters defined
         as letters for the current locale.
\W       Matches the complement of \w.
\\       Matches a literal backslash.

總結以下：ui

簡寫	含義
\number	匹配模式組number
\A	等價於^
\Z	等價於$
\b	匹配單詞開頭結尾的空格
\B	匹配的空格不在單詞的開頭結尾
\d	等價於[0-9]
\D	等價於[^0-9]
\s	等價於[ \t\n\r\f\v]
\S	等價於[^ \t\n\r\f\v]
\w	等價於[a-zA-Z0-9_]
\W	等價於[^a-zA-Z0-9_]
\	轉義匹配\

有關\number的用法

在集合簡寫這一塊中，比較難的的是\number的正確使用spa

例1：查找字符串中是否有字符重複出現

In [39]: bool(re.search(r'(.).*\1','asdfasjkdlk'))
Out[39]: True

例2: 查找字符串中是否有字符連續出現四次

In [40]: bool(re.search(r'(.)\1{3}','asdfaaaasjkdlk'))
Out[40]: True

上面兩個例子中\1表示該處出現的字符或字符串與匹配組1的字符或字符串相同，這裏是(.)code

正則表達式函數

幫助文檔以下：

match    Match a regular expression pattern to the beginning of a string.
search   Search a string for the presence of a pattern.
sub      Substitute occurrences of a pattern found in a string.
subn     Same as sub, but also return the number of substitutions made.
split    Split a string by the occurrences of a pattern.
findall  Find all occurrences of a pattern in a string.
finditer Return an iterator yielding a match object for each match.
compile  Compile a pattern into a RegexObject.
purge    Clear the regular expression cache.
escape   Backslash all non-alphanumerics in a string.

總結經常使用函數以下：

函數名	用法
match	從字符串開頭開始匹配
search	搜索與模式匹配的字符串
sub	替代匹配的字符串
split	更具模式分割字符串
findall	返回全部的模式匹配字符串
finditer	迭代返回match對象
complie	編譯模式
escape	轉義字符串中全部特殊字符，這樣不用在字符串中寫入討厭的\符號