In Python 3, bytes contains sequences of 8-bit values, str contains sequences of
Unicode characters. bytes and str instances can’t be used together with operators
(like > or +).html
在Python3之後,字符串和bytes類型完全分開了。字符串是以字符爲單位進行處理的,bytes類型是以字節爲單位處理的。python
建立、與字符串的相互轉化以下:git
# (1)
b = b'' # 建立一個空的bytes b = byte() # 建立一個空的bytes
# (2) b = b'hello' # 直接指定這個hello是bytes類型
# (3) b = bytes('string',encoding='編碼類型') #利用內置bytes方法,將字符串轉換爲指定編碼的bytes b = str.encode('編碼類型') # 利用字符串的encode方法編碼成bytes,默認爲utf-8類型 bytes.decode('編碼類型'):將bytes對象解碼成字符串,默認使用utf-8進行解碼。
若是相變的話:string --> list --> stringweb
S = 'Spam" S.find('pa') S.replace('pa', 'XYZ') S.isalpha(), S.isdigit()
In [5]: dir(S) Out[5]: ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
查看說明:正則表達式
help(S.replace)
先去掉先後空格,再分割的過程。api
>>> s.strip().split(',') ['hello', ' world', ' hao', '', '123']
提取括號中的內容,以下。app
str="hello boy<[www.baidu.com]>byebye"
print(str.split("[")[1].split("]")[0]) www.baidu.com
os.path.split() 函數框架
import os print(os.path.split('/dodo/soft/python/')) # path + filename ('/dodo/soft/python', '') print(os.path.split('/dodo/soft/python')) ('/dodo/soft', 'python')
filepath, tmpfilename = os.path.split(fileUrl)
shotname, extension = os.path.splitext(tmpfilename)
The os module contains two sub-modules os.sys (same as sys) and os.path that are dedicated to the system and directories; respectively.ssh
import oside
import os.sys
import os.path
with open('somefile', 'r') as f: for line in f: print(line, end='') """ Hello World Python """
with open('somefile','r') as f: content = list(f) print(content)
""" ['Hello\n', 'World\n', 'Python'] """
以上的 list(f) 即是默認的readlines();
with open('somefile','r') as f: content = f.readlines() print(content)
""" ['Hello\n', 'World\n', 'Python'] """
with open('somefile','r') as f: content = f.read().splitlines() print(content)
""" ['Hello', 'World', 'Python'] """
或者,本身手動使用 rstrip() 去掉結尾的「換行符號」;去掉行首就換爲 strip();
with open('somefile','r') as f: content = [line.rstrip('\n') for line in f] print(content)
""" ['Hello', 'World', 'Python'] """
>>>seq = ['one', 'two', 'three'] >>> for i, element in enumerate(seq): ... print i, element
0 one 1 two 2 three
with open('somefile', 'r') as f: for number, line in enumerate(f,start=1): print(number, line, end='')
""" 1 Hello 2 World 3 Python """
(1) 定好方向 --> (2) 而後輸出
>>> import sys # Printing the hard way
>>> sys.stdout.write('hello world\n') // 默認打印到屏幕 hello world
C:\code> c:\python33\python
>>> import sys >>> temp = sys.stdout # Save for restoring later
>>> sys.stdout = open('log.txt', 'a') # Redirect prints to a file
>>> print('spam') # Prints go to file, not here
>>> print(1, 2, 3) >>> sys.stdout.close() # Flush output to disk
>>> sys.stdout = temp # Restore original stream
>>> print('back here') # Prints show up here again
back here >>> print(open('log.txt').read()) # Result of earlier prints
spam 1 2 3
log = open('log.txt', 'a') # 3.X print(x, y, z, file=log) # Print to a file-like object print(a, b, c) # Print to original stdout
# 老版本 log = open('log.txt', 'a') # 2.X print >> log, x, y, z # Print to a file-like object print a, b, c # Print to original stdout
日誌顯示和保存都兼顧,怎麼辦?
暫時寫個函數,包含兩種打印好了。
from __future__ import print_function
(1) C語言格式;(2) index方式;(3) auto index方式;(4) dict方式;
# Dictionary-Based Formatting Expressions
>>> '%(qty)d more %(food)s' % {'qty': 1, 'food': 'spam'} '1 more spam'
String Formatting Expressions --> 具體參見:268/1594
(a) 小數保留幾位
(b) 數字佔用寬度
(c) 位置的小技巧
print('%2d-%02d' % (3, 1)) 3-01
len(S) ord('\n') # 查看 ASCII chr(115) # 查看 對應的char
>>> msg = """ aaaaaaaaaaaaa bbb'''bbbbbbbbbb""bbbbbbb'bbbb cccccccccccccc """ >>> msg '\naaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\'bbbb\ncccccccccccccc\n'
In [40]: r"C:\new\test.spm" Out[40]: 'C:\\new\\test.spm'
From: http://blog.csdn.net/u013961718/article/details/51100464
能夠理解爲更高級的打印方式,畢竟應用於項目中。
Ref: python logging 替代print 輸出內容到控制檯和重定向到文件
logging.DEBUG
logging.INFO
logging.WARNING
logging.ERROR
logging.CRITICAL
import logging logging.basicConfig(level = logging.DEBUG, format = '%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s', datefmt = '%a, %d %b %Y %H:%M:%S', filename = 'myapp.log', filemode = 'w')
#logging.config模塊能夠經過加載配置文件,歷來配置日誌屬性
logging.debug('This is debug message') logging.info('This is info message') logging.warning('This is warning message')
日誌打印到:./myapp.log 文件
./myapp.log文件中內容爲: Sun, 24 May 2009 21:48:54 demo2.py[line:11] DEBUG This is debug message Sun, 24 May 2009 21:48:54 demo2.py[line:12] INFO This is info message Sun, 24 May 2009 21:48:54 demo2.py[line:13] WARNING This is warning
import logging logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s', datefmt='%a, %d %b %Y %H:%M:%S', filename='myapp.log', filemode='w') ################################################################################################# #定義一個StreamHandler,將INFO級別或更高的日誌信息打印到標準錯誤,並將其添加到當前的日誌處理對象# console = logging.StreamHandler() console.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s') console.setFormatter(formatter) logging.getLogger('').addHandler(console) ################################################################################################# logging.debug('This is debug message') logging.info('This is info message') logging.warning('This is warning message')
結果:
屏幕上打印: root : INFO This is info message root : WARNING This is warning message
./myapp.log文件中內容爲: Sun, 24 May 2009 21:48:54 demo2.py[line:11] DEBUG This is debug message Sun, 24 May 2009 21:48:54 demo2.py[line:12] INFO This is info message Sun, 24 May 2009 21:48:54 demo2.py[line:13] WARNING This is warning message
其餘詳見:6、Unicode Strings 160/1594,內容略
正則引擎原理:[IR] XPath for Search Query
使用教程: 正則表達式30分鐘入門教程
典型應用:字符串信息提取,路徑的提取;能夠替代 split()。
In [8]: >>> import re ...: ...: >>> match = re.match('Hello[ \t]*(.*)world', 'Hello Python world') ...: ...: >>> match.group(1) ...: Out[8]: 'Python '
-------------------------------------------------------------------------------------- In [9]: >>> match = re.match('[/:](.*)[/:](.*)[/:](.*)', '/usr/home:lumberjack') ...: ...: >>> match.groups() ...: Out[9]: ('usr', 'home', 'lumberjack')
--------------------------------------------------------------------------------------- In [10]: >>> re.split('[/:]', '/usr/home/lumberjack') Out[10]: ['', 'usr', 'home', 'lumberjack']
一個簡單的框架代碼:
def filter_mail(emails): return list(filter(fun, emails)) # 2.fun 是個自定義的函數,返回:True/False,也是個re.
if __name__ == '__main__': n = int(input()) emails = [] for _ in range(n): emails.append(input()) # 1.獲取mail list
filtered_emails = filter_mail(emails) filtered_emails.sort() # 3.排序 print(filtered_emails)
Valid email addresses must follow these rules:
* It must have the username@websitename.extension format type.
* The username can only contain letters, digits, dashes and underscores.
* The website name can only have letters and digits.
* The maximum length of the extension is .
import re
re.search(r'^[A-Za-z0-9-_]+@[A-Za-z0-9]+\.\w?\w?\w$',s)
常見字符串匹配
# 先是一個單詞hi,而後是任意個任意字符(但不能是換行),最後是Lucy這個單詞
\bhi\b.*\bLucy\b
# 匹配以字母a開頭的單詞——先是某個單詞開始處(\b),而後是字母a,而後是任意數量的字母或數字(\w*),最後是單詞結束處(\b)。
\ba\w*\b
# 匹配以.tif結尾的單詞
re.search( ".*\\.tif",f)]
# 匹配1個或更多連續的數字。這裏的+是和*相似的元字符,不一樣的是*匹配重複任意次(多是0次),而+則匹配重複1次或更屢次。
\d+
# 匹配恰好6個字符的單詞。
\b\w{6}\b
# 填寫的QQ號必須爲5位到12位數字:開始--> ^ ... $ <--結束
^\d{5,12}$
電話號碼
# 中國的電話號碼 - 簡單版本
0\d\d-\d\d\d\d\d\d\d\d 以下改進版
0\d{2}-\d{8}
# 匹配幾種格式的電話號碼,像(010)88886666,或022-22334455,或02912345678等。
\(?0\d{2}[) -]?\d{8}
However,也能匹配010)12345678或(022-87654321這樣的「不正確」的格式。
那,怎麼辦?-- 分枝條件
# 匹配兩種以連字號分隔的電話號碼:一種是三位區號,8位本地號(如010-12345678),一種是4位區號,7位本地號(0376-2233445)。
0\d{2}-\d{8}|0\d{3}-\d{7}
繼續補充。。。用到再說。