該文來自個人我的博客, http://zpeng.gitcafe.io/2016/01/19/python/re-python/, 轉載請註明,謝謝。python
斷斷續續花了一個多月把,hackerrank上關於python正則部分的題刷完了,對python正則表達式的認識達到了新的高度。寫下博文記錄一些python正則的技能點。git
在ipython中輸入help(re),查看正則模塊re的幫組信息。正則表達式
python正則中的特殊字符以下:express
The special characters are: "." Matches any character except a newline. "^" Matches the start of the string. "$" Matches the end of the string or just before the newline at the end of the string. "*" Matches 0 or more (greedy) repetitions of the preceding RE. Greedy means that it will match as many repetitions as possible. "+" Matches 1 or more (greedy) repetitions of the preceding RE. "?" Matches 0 or 1 (greedy) of the preceding RE. *?,+?,?? Non-greedy versions of the previous three special characters. {m,n} Matches from m to n repetitions of the preceding RE. {m,n}? Non-greedy version of the above. "\\" Either escapes special characters or signals a special sequence. [] Indicates a set of characters. A "^" as the first character indicates a complementing set. "|" A|B, creates an RE that will match either A or B. (...) Matches the RE inside the parentheses. The contents can be retrieved or matched later in the string. (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below). (?:...) Non-grouping version of regular parentheses. (?P<name>...) The substring matched by the group is accessible by name. (?P=name) Matches the text matched earlier by the group named name. (?#...) A comment; ignored. (?=...) Matches if ... matches next, but doesn't consume the string. (?!...) Matches if ... doesn't match next. (?<=...) Matches if preceded by ... (must be fixed length). (?<!...) Matches if not preceded by ... (must be fixed length). (?(id/name)yes|no) Matches yes pattern if the group with id/name matched, the (optional) no pattern otherwise.
總結以下:ide
符號 | 含義 |
---|---|
. | 匹配除了換行符的任意字符 |
^ | 字符串開頭或 用在[]中表示不匹配某些字符 |
$ | 匹配模式的末尾 |
* | 匹配任意多的字符 |
+ | 至少匹配一個字符 |
? | 匹配0或一個字符 |
{m,n} | 表示重複前面模式m到n次,可表達爲{m},{m,},{m,n} |
[] | 字符集合 |
() | 子模式 |
| | 模式或 |
?字符在正則表達式中功能特別豐富,看淺析python正則表達式二:用好問號。函數
查看re幫助文檔,以下:學習
\number Matches the contents of the group of the same number. \A Matches only at the start of the string. \Z Matches only at the end of the string. \b Matches the empty string, but only at the start or end of a word. \B Matches the empty string, but not at the start or end of a word. \d Matches any decimal digit; equivalent to the set [0-9]. \D Matches any non-digit character; equivalent to the set [^0-9]. \s Matches any whitespace character; equivalent to [ \t\n\r\f\v]. \S Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v]. \w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale. \W Matches the complement of \w. \\ Matches a literal backslash.
總結以下:ui
簡寫 | 含義 |
---|---|
\number | 匹配模式組number |
\A | 等價於^ |
\Z | 等價於$ |
\b | 匹配單詞開頭結尾的空格 |
\B | 匹配的空格不在單詞的開頭結尾 |
\d | 等價於[0-9] |
\D | 等價於[^0-9] |
\s | 等價於[ \t\n\r\f\v] |
\S | 等價於[^ \t\n\r\f\v] |
\w | 等價於[a-zA-Z0-9_] |
\W | 等價於[^a-zA-Z0-9_] |
\ | 轉義匹配\ |
在集合簡寫這一塊中,比較難的的是\number的正確使用spa
In [39]: bool(re.search(r'(.).*\1','asdfasjkdlk')) Out[39]: True
In [40]: bool(re.search(r'(.)\1{3}','asdfaaaasjkdlk')) Out[40]: True
上面兩個例子中\1表示該處出現的字符或字符串與匹配組1的字符或字符串相同,這裏是(.)code
幫助文檔以下:
match Match a regular expression pattern to the beginning of a string. search Search a string for the presence of a pattern. sub Substitute occurrences of a pattern found in a string. subn Same as sub, but also return the number of substitutions made. split Split a string by the occurrences of a pattern. findall Find all occurrences of a pattern in a string. finditer Return an iterator yielding a match object for each match. compile Compile a pattern into a RegexObject. purge Clear the regular expression cache. escape Backslash all non-alphanumerics in a string.
總結經常使用函數以下:
函數名 | 用法 |
---|---|
match | 從字符串開頭開始匹配 |
search | 搜索與模式匹配的字符串 |
sub | 替代匹配的字符串 |
split | 更具模式分割字符串 |
findall | 返回全部的模式匹配字符串 |
finditer | 迭代返回match對象 |
complie | 編譯模式 |
escape | 轉義字符串中全部特殊字符,這樣不用在字符串中寫入討厭的\符號 |
正則表達式函數都有個參數flag,其中令flage=re.I,表示對大小寫不敏感。
python正則表達式的基本概念也就這些,學習這些的最好資料是自帶的幫助文檔。
該文來自個人我的博客, http://zpeng.gitcafe.io/2016/01/19/python/re-python/, 轉載請註明,謝謝。