1.分割字符串python
re.split()正則表達式
>>> line = 'asdf fjdk; afed, fjek,asdf, foo' >>> import re >>> re.split(r'[;,\s]\s*', line) ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo'] 若是使用()捕捉,則匹配項也包含在最終結果中 >>> re.split(r'(;|,|\s)\s*', line) ['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo'] 若是你不想讓分割字符出如今結果中,但仍須要使用()來分割,能夠使用非捕捉組,如(?:.....) >>> re.split(r'(?:,|;|\s)\s*', line) ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
2在開頭或結尾匹配字符串shell
最簡單的方法是使用str.startswith(),str.endswith() >>> filename = 'spam.txt' >>> filename.endswith('.txt') True >>> filename.startswith('file:') False 也能夠向startswith或endswith()提供多個參數,但必須是tuple >>> choices = ['http:', 'ftp:'] >>> url.startswith(choices) --------------------------------------------------------------------------- Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple, not list --------------------------------------------------------------------------- >>> choices = ('http:', 'ftp:') >>> url.startswith(choices) True 用正則表達式也能夠實現,不過對於簡單匹配,有點小題大作 >>> import re >>> url = 'http://www.python.org' >>> re.match('http:|https:|ftp:', url) <_sre.SRE_Match object at 0x101253098>
3用shell通配符匹配字符串ubuntu
>>> from fnmatch import fnmatch >>> fnmatch('foo.txt', '*.txt') True >>> fnmatch('Dat45.csv', 'Dat[0-9]*') True 對於不一樣的平臺,可能會有大小寫不敏感問題, >>> fnmatch('foo.txt', '*.TXT')#ubuntu14.10 False #這個時候能夠使用fnmatchcase,是大小寫敏感的 >>> from fnmatch import fnmatchcase >>> fnmatchcase('foo.txt', '*.TXT') False
4匹配和搜索文本緩存
>>> text = 'yeah, but no, but yeah, but no, but yeah' >>> text.find('no') 10 >>> datepat = re.compile(r'\d+/\d+/\d+') >>> text1 = '11/27/2012' >>> datepat.match(text1) <_sre.SRE_Match object at 0x7f0bda8cc4a8> re.match()從字符串的開頭匹配,若是想找到全部的匹配項,用findall() >>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' >>> datepat.findall(text) ['11/27/2012', '3/13/2013'] findall()返回的是一個list,若是你向要返回一個可迭代對象,能夠用finditer() >>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)') >>> for m in datepat.finditer(text): ... print m.groups() ... ('11', '27', '2012') ('3', '13', '2013')
若是你須要常用某個正則表達式,最好先用compiler編譯,雖然模塊功能會緩存最近編譯的表達式,因此不會有很大的
性能改善,可是使用本身編譯的正則表達式節省了額外的查找和處理時間性能
5查找替換字符串url
簡單的替換能夠用replace,並未改變原來的文本 >>> text = 'yeah, but no, but yeah, but no, but yeah' >>> text.replace('yeah', 'yep') 'yep, but no, but yep, but no, but yep' >>> text 'yeah, but no, but yeah, but no, but yeah' 也能夠用re.sub() >>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' >>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text) 'Today is 2012-11-27. PyCon starts 2013-3-13.' >>> text 'Today is 11/27/2012. PyCon starts 3/13/2013.' 若是須要重複使用該正則表達式,最好先編譯 >>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)') >>> datepat.sub(r'\3-\1-\2', text) 'Today is 2012-11-27. PyCon starts 2013-3-13.' 若是你想知道有多少個匹配項能夠用subn >>> newtext, n = datepat.subn(r'\3-\1-\2', text) >>> newtext 'Today is 2012-11-27. PyCon starts 2013-3-13.' >>> n 2