此文章收集平時工做中一些Pythonic code,以供後面參考。python
一般以往的作法是設置一個標誌位來處理或其餘額外處理:git
def get_file_content(fpath): """Get file content by the right encoding.""" G_ENCODING_LIST = ['utf-8', 'gbk', 'latin1'] for encode in G_ENCODING_LIST: try: content = open(fpath, encoding=encode).read() return content except UnicodeDecodeError: if encode == G_ENCODING_LIST[-1]: raise except FileNotFoundError: raise
而下例中的作法則是利用了Python自己語法的特性:github
def get_file_content(fpath): """Get file content by the right encoding.""" G_ENCODING_LIST = ['utf-8', 'gbk', 'latin1'] for encode in G_ENCODING_LIST: try: content = open(fpath, encoding=encode).read() return content except UnicodeDecodeError: pass except FileNotFoundError: raise else: raise UnicodeDecodeError
另外須要注意的是:app
建立列表,一般的作法能夠是這樣:函數
def add_patterns(self, ptn_docs): """Add pattern set info.""" ltypes = [] for ltype, doc in ptn_docs: ltypes.append(ltype) # 不使用列表表達式建立列表 doc_list = [] for word in jieba.cut(doc): doc_list.append(word) doc_list = [word for word in jieba.cut(doc)] doc_list = list(set(doc_list) - set(G_STOP_WORDS)) self._ptn_simtest_dbs[ltype]['all_doc_list'].append(doc_list) self._ptn_simtest_dbs[ltype]['dict'].add_documents([doc_list]) ......
Pythonic的作法則是這樣:學習
def add_patterns(self, ptn_docs): """Add pattern set info.""" ltypes = [] for ltype, doc in ptn_docs: ltypes.append(ltype) # 使用列表表達式建立列表 doc_list = [word for word in jieba.cut(doc)] doc_list = list(set(doc_list) - set(G_STOP_WORDS)) self._ptn_simtest_dbs[ltype]['all_doc_list'].append(doc_list) self._ptn_simtest_dbs[ltype]['dict'].add_documents([doc_list]) ......
註釋:code
doc_list = list(set(doc_list) - set(G_STOP_WORDS))
根據判斷對象結果返回True or False,能夠經過如下方法簡寫:對象
def _check_fingerprint(self, suspect): """Check whether fingerprint exist.""" content = open(suspect, 'rb').read() md5sum = hashlib.md5(content).hexdigest() wsp = self.ws_data.filter(fingerprint=md5sum) return True is wsp else False
還有更簡潔的寫法嗎:ip
def _check_fingerprint(self, suspect): """Check whether fingerprint exist.""" content = open(suspect, 'rb').read() md5sum = hashlib.md5(content).hexdigest() wsp = self.ws_data.filter(fingerprint=md5sum) return bool(wsp)
記得學習C語言的時候,老師一般會說不建議咱們使用「goto」這樣的語法,以避免形成意想不到的結果。
但實際工做中,想「goto」這種語法糖在有些場景中有讓人愛不釋手。md5
先看這個場景,下面代碼解析一個壓縮包中的pattern是否知足指定格式:
def check_pattern_package(fpath): """Check pattern package correctness.""" base_dir = os.path.dirname(fpath) ret, reason, extract_dir = True, None, None with zipfile.ZipFile(fpath) as zf: infolist = zf.infolist() if not infolist[0].is_dir(): return False, REST_ERR_400_ZIP_BADFILE zf_base_dir = infolist[0].filename md5sum_file = os.path.join(zf_base_dir, 'md5sum.txt') if md5sum_file not in zf.namelist(): return False, REST_ERR_400_ZIP_BADFORMAT zf.extractall(base_dir) extract_dir = os.path.join(base_dir, zf_base_dir) try: with open(os.path.jion(base_dir, md5sum_file)) as md5_fp: reader = csv.reader(md5_fp, delimiter=' ') except FileNotFoundError: return False, REST_ERR_400_ZIP_BADFORMAT else: for row in reader: if len(row) < 2: raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT) pzf = os.path.join(extract_dir, row[1]) with open(pzf, 'rb') as fpzf: fdata = fpzf.read() md5sum = hashlib.md5(fdata).hexdigest() if md5sum != row[0]: return False, REST_ERR_400_ZIP_BADFILE return True, _
在上述代碼中,爲了更好的執行效率,一旦發現格式不符函數直接返回。
如今對這個函數有一個新的需求:根據函數的輸入參數,刪除壓縮包和解壓縮目錄全部文件。對於該需求,若是按照上述代碼執行,則須要在每個「return」關鍵字處對須要刪除的文件進行處理,這樣會有不少重複代碼。以往在C語言中,使用「goto」能夠很好的完成任務,很遺憾Python並不支持「goto」。
使用「try exception」控制代碼執行路徑模擬「goto」:
class PtnPackageParseError(Exception): """Exception for pattern package parse.""" def __init__(self, reason, message=''): self.reason = reason self.message = message super().__init__() def check_pattern_package(fpath, cleanup=False): """Check pattern package correctness.""" base_dir = os.path.dirname(fpath) ret, reason, extract_dir = True, None, None try: with zipfile.ZipFile(fpath) as zf: infolist = zf.infolist() if not infolist[0].is_dir(): raise PtnPackageParseError(REST_ERR_400_ZIP_BADFILE) zf_base_dir = infolist[0].filename md5sum_file = os.path.join(zf_base_dir, 'md5sum.txt') if md5sum_file not in zf.namelist(): raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT) zf.extractall(base_dir) extract_dir = os.path.join(base_dir, zf_base_dir) try: with open(os.path.jion(base_dir, md5sum_file)) as md5_fp: reader = csv.reader(md5_fp, delimiter=' ') except FileNotFoundError: raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT) else: for row in reader: if len(row) < 2: raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT) pzf = os.path.join(extract_dir, row[1]) with open(pzf, 'rb') as fpzf: fdata = fpzf.read() md5sum = hashlib.md5(fdata).hexdigest() if md5sum != row[0]: raise PtnPackageParseError(REST_ERR_400_ZIP_BADFILE) except PtnPackageParseError as e: ret, reason = False, e.reason finally: if cleanup: os.unlink(fpath) if os.path.exists(extract_dir): os.removedirs(extract_dir) return ret, reason, extract_dir