19 python正則表達式及相關函數

 1 1，正則表達式  2 ########################################
 3 1，元字符  4    . ^ $ * + ? {} [] \ | ()  5 
 6 "." 任意字符  7 "^"    字符串開始    '^hello'匹配'helloworld'而不匹配'aaaahellobbb'
 8 "$" 字符串結尾 與上同理  9 "\\" 特殊字符轉義或者特殊序列  10 [] 表示一個字符集,匹配字符集中的一個字符  11         [0-9]、[a-z]、[A-Z]、[^0]  12  []中若是出現元字符，則元字符匹配意義失效，只做爲普通字符  13 
 14 "*" 0 個或多個字符（貪婪匹配）    <*>匹配<title>chinaunix</title>
 15 "+"    1 個或多個字符（貪婪匹配） 與上同理  16 "?" 0 個或1個字符（貪婪匹配） 便可有可無  17 {m}     對前一個字符重複m次，a{6}匹配6個a、a{2,4}匹配2到4個a  18 {m,n} 對於前一個字符重複m到n次，  19 {m,n}? 對於前一個字符重複m到n次，並取儘量少  20        re.findall("a{2,4}?","aaaaaa")中 只會匹配2個  21 
 22 
 23 
 24 "|"    或    A|B,或運算  25 (...) 匹配括號中任意表達式  26 (?#...) 註釋，可忽略 
 27 (?=...)    Matches if ... matches next, but doesn't consume the string. '(?=test)' 在hellotest中匹配hello
 28 (?!...)    Matches if ... doesn't match next. '(?!=test)' 若hello後面不爲test，匹配hello
 29 (?<=...)     Matches if preceded by ... (must be fixed length).    '(?<=hello)test' 在hellotest中匹配test  30 (?<!...)    Matches if not preceded by ... (must be fixed length).    '(?<!hello)test' 在hellotest中不匹配test  31 
 32 #############################################
 33  正則表達式特殊序列表以下：  34 \A 只在字符串開始進行匹配  35 \Z 只在字符串結尾進行匹配  36 \b 匹配位於開始或結尾的空字符串  37 \B 匹配不位於開始或結尾的空字符串  38 \d    至關於[0-9]  39 \D    至關於[^0-9]  40 \s 匹配任意空白字符:[\t\n\r\r\v]  41 \S    匹配任意非空白字符:[^\t\n\r\r\v]  42 \w    匹配任意數字和字母:[a-zA-Z0-9]  43 \W    匹配任意非數字和字母:[^a-zA-Z0-9]  44 
 45 ########################################################
 46 ########## 正則表達式的一些函數 ###############
 47 ########################################################
 48 python中re模塊的用法  49 
 50 Python 的 re 模塊（Regular Expression 正則表達式）  51 
 52 下面我主要總結了re的經常使用方法。  53 1.re的簡介  54  使用python的re模塊，python 會將正則表達式轉化爲字節碼，利用 C 語言的匹配引擎進行深度優先的匹配。  55 
 56 help(‘modules’)    ---查看python中已經安裝好的模塊  57 import re  58 print re.__doc__   ##能夠查詢re模塊的功能描述信息，即模塊前面的註釋
 59  下面會結合幾個例子說明。  60 
 61 ############################################
 62         i=patt.search(line)  63         if i!=None:  64             print line  65         else:  66             print "xxxx"
 67 
 68 
 69 1.re的主要功能函數  70  經常使用的功能函數包括：compile、search、match、split、findall（finditer）、sub（subn）  71     ##說明如下中括號，表示無關緊要（便是可選參數）
 72 compile  73  re.compile(pattern[, flags])  74  做用：  75  把正則表達式語法轉化成正則表達式對象  76  flags定義包括：  77  re.I：忽略大小寫  78  re.L：表示特殊字符集 \w, \W, \b, \B, \s, \S 依賴於當前環境  79  re.M：多行模式(若是要匹配的字符串是多行的話，即忽略換行符)  80  re.S： 即’ . ’而且包括換行符在內的任意字符（注意：’ . ’不包括換行符）  81             即增長了 '.'所可以匹配的範圍  82  re.U： 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依賴於 Unicode 字符屬性數據庫  83      
 84 search  ##從整個字符串中匹配
 85     ##返回的也是個match對象或者空
 86  re.search(pattern, string[, flags])  87  search (string[, pos[, endpos]])  88  做用：在字符串中查找匹配正則表達式模式的位置，返回 MatchObject 的實例，若是沒有找到匹配的位置，則返回 None。  89 
 90 match    ##只匹配開頭的
 91     ##返回的是一個match對象或者空
 92  re.match(pattern, string[, flags])  93  match(string[, pos[, endpos]])  94  做用：  95  match() 函數只在字符串的開始位置嘗試匹配正則表達式，也就是隻報告從位置 0 開始的匹配狀況，  96  而 search() 函數是掃描整個字符串來查找匹配。若是想要搜索整個字符串來尋找匹配，應當用 search()。  97 
 98 split  99       ##返回字符串列表
100      re.split(pattern, string[, maxsplit=0, flags=0]) 101      split(string[, maxsplit=0]) 102  做用：能夠將字符串匹配正則表達式的部分割開並返回一個列表 103 
104  例： 105         list1=re.split(r"ni","sssniaaanidddniiii") 106         print (list1) 107  結果爲： 108         ["sss","aaa","ddd","iiii"] 109  例: 110         list1=re.split(r"(ni)","sssniaaanidddniiii")    ###注意正則表達式使用括號和不使用括號的區別 
111         print (list1) 112  結果爲： 113          ["sss","ni","aaa","ni","ddd","ni","iiii"] 114 
115 
116 findall 117       ##返回匹配成功的字符列表或者空
118  re.findall(pattern, string[, flags]) 119  findall(string[, pos[, endpos]]) 120  做用：在字符串中找到正則表達式所匹配的全部子串，並組成一個列表返回 121  例：查找[]包括的內容（貪婪和非貪婪查找） 122  例： 123      a=re.findall(r"ni","woainidddni") 124      print (a) 125      結果爲：["ni","ni"]     ##返回的是將匹配成功的字符串截取出來，返回，組成列表
126 
127 
128 finditer () 129      ##返回的是一個迭代器（iterator）的對象或者空
130 
131 sub()/subn() 132  sub(正則，要替換成的字符，原始字符串) 133  例： 134      r=re.compile (r"a..b" ) 135      a=r.sub("eee","ddaxybccc") 136      print (a) 137      >>ddeeeccc 138       --------
139      print ("#"*20) 140      rr=r"a..b"
141      a=re.sub(rr,"eeeeee","dddaxxbccc") 142      print (a) 143      >>dddeeeeeeccc 144      -----------
145      rr=r"a..b"
146      a=re.sub(rr,"","dddaxxbccc") 147      print (a) 148 
149      
150  說明：replace也是能夠進行替換的，可是的參數並非正則（由於replace參數不識別元字符） 151     
152 
153        
154 ####### 例子： ########################
155 例：最基本的用法，經過re.RegexObject對象調用 156        #!/usr/bin/env python
157        import re 158        r1 = re.compile(r'world')       ##把「word」字符串，轉化成正則表達式的對象，這樣就能夠調用正則表達式的其它函數了
159        if r1.match('helloworld'):      ##調用匹配函數
160            print 'match succeeds'
161        else: 162            print 'match fails'         ##由於是從位置0開始match因此，未匹配成功，
163 
164        if r1.search('helloworld'): 165            print 'search succeeds'      ##由於是搜索整個字符串進行匹配，因此會匹配成功
166        else: 167            print 'search fails'
168  說明： 169  如下執行結果爲： 170  match fails 171  search fails 172 
173 說明一下： 174      r是raw(原始)的意思。由於在表示字符串中有一些轉義符，如表示回車'\n'。若是要表示\表須要寫爲'\\'。但若是我就是須要表示一個'\'+'n'，不用r方式要寫爲:'\\n'。但使用r方式則爲r'\n'這樣清晰多了。
175 
176 例：設置flag 177 
178      #r2 = re.compile(r'n$', re.S)
179      #r2 = re.compile('\n$', re.S)
180      r2 = re.compile('World$', re.I)    ##設置忽略大小寫
181      if r2.search('helloworld\n'):      ##因此就能夠匹配成功
182          print 'search succeeds'
183      else: 184          print 'search fails'
185      
186 例：直接調用 187 
188        if re.search(r'abc','helloaaabcdworldn'): 189            print 'search succeeds'
190        else: 191            print 'search fails'
192 
193  說明： 194  使用python是的正則表達式， 195       1，能夠先 生成正則表達式對象 196       1.1，再使用該對象調用相應函數進行正則匹配 197 
198       2，能夠直接使用正則表達式類，調用類中的方法，將正則表達式做爲第一個參數，被匹配的做爲第二個參數， 199 
200 #################################################################
201 4，正則分組 202  使用() 203  例： 204  匹配郵箱 205       email=r"\w{3}@\w+(\.com|\.cn)"
206       re.findall(email,"hyy@sina.com") 207       re.findall(email,"hyy@yahoo.cn") 208 
209  注意： 210  若是正則中使用到分組，則返回的只有分組所匹配成功的數據（雖然說匹配的有不少） 211  即優先返回分組所匹配的數據 212 
213  說明： 214  由於有時候，咱們雖然說匹配不少，可是咱們只想要匹配出來字符串的某一段 215        此時只需使用()進行分組就行了