Pythonic Code in Practice

此文章收集平時工做中一些Pythonic code,以供後面參考。python

循環結束處理

一般以往的作法是設置一個標誌位來處理或其餘額外處理:git

def get_file_content(fpath):
    """Get file content by the right encoding."""
    G_ENCODING_LIST = ['utf-8', 'gbk', 'latin1']
    for encode in G_ENCODING_LIST:
        try:
            content = open(fpath, encoding=encode).read()
            return content
        except UnicodeDecodeError:
            if encode == G_ENCODING_LIST[-1]:
                raise
        except FileNotFoundError:
            raise

而下例中的作法則是利用了Python自己語法的特性:github

def get_file_content(fpath):
    """Get file content by the right encoding."""
    G_ENCODING_LIST = ['utf-8', 'gbk', 'latin1']
    for encode in G_ENCODING_LIST:
        try:
            content = open(fpath, encoding=encode).read()
            return content
        except UnicodeDecodeError:
            pass
        except FileNotFoundError:
            raise
    else:
        raise UnicodeDecodeError

另外須要注意的是:app

  • 在except分支中,若是是拋出剛抓取的異常,可沒必要指定。
  • 爲了不過深的try except嵌套,這裏使用了for循環使代碼塊更加扁平

列表生成式

建立列表,一般的作法能夠是這樣:函數

def add_patterns(self, ptn_docs):
    """Add pattern set info."""
    ltypes = []
    for ltype, doc in ptn_docs:
        ltypes.append(ltype)
        # 不使用列表表達式建立列表
        doc_list = []
        for word in jieba.cut(doc):
            doc_list.append(word)
        doc_list = [word for word in jieba.cut(doc)]
        doc_list = list(set(doc_list) - set(G_STOP_WORDS))
        self._ptn_simtest_dbs[ltype]['all_doc_list'].append(doc_list)
        self._ptn_simtest_dbs[ltype]['dict'].add_documents([doc_list])
    ......

Pythonic的作法則是這樣:學習

def add_patterns(self, ptn_docs):
    """Add pattern set info."""
    ltypes = []
    for ltype, doc in ptn_docs:
        ltypes.append(ltype)
        # 使用列表表達式建立列表
        doc_list = [word for word in jieba.cut(doc)]
        doc_list = list(set(doc_list) - set(G_STOP_WORDS))
        self._ptn_simtest_dbs[ltype]['all_doc_list'].append(doc_list)
        self._ptn_simtest_dbs[ltype]['dict'].add_documents([doc_list])
    ......

註釋:code

  • 這裏使用了集合差的方式求列表差集:
doc_list = list(set(doc_list) - set(G_STOP_WORDS))

布爾值判斷

根據判斷對象結果返回True or False,能夠經過如下方法簡寫:對象

def _check_fingerprint(self, suspect):
    """Check whether fingerprint exist."""
    content = open(suspect, 'rb').read()
    md5sum = hashlib.md5(content).hexdigest()
    wsp = self.ws_data.filter(fingerprint=md5sum)
    return True is wsp else False

還有更簡潔的寫法嗎:ip

def _check_fingerprint(self, suspect):
    """Check whether fingerprint exist."""
    content = open(suspect, 'rb').read()
    md5sum = hashlib.md5(content).hexdigest()
    wsp = self.ws_data.filter(fingerprint=md5sum)
    return bool(wsp)

goto in Python

記得學習C語言的時候,老師一般會說不建議咱們使用「goto」這樣的語法,以避免形成意想不到的結果。
但實際工做中,想「goto」這種語法糖在有些場景中有讓人愛不釋手。md5

先看這個場景,下面代碼解析一個壓縮包中的pattern是否知足指定格式:

def check_pattern_package(fpath):
    """Check pattern package correctness."""
    base_dir = os.path.dirname(fpath)
    ret, reason, extract_dir = True, None, None
    with zipfile.ZipFile(fpath) as zf:
        infolist = zf.infolist()
        if not infolist[0].is_dir():
            return False, REST_ERR_400_ZIP_BADFILE
        zf_base_dir = infolist[0].filename
        md5sum_file = os.path.join(zf_base_dir, 'md5sum.txt')
        if md5sum_file not in zf.namelist():
            return False, REST_ERR_400_ZIP_BADFORMAT
        zf.extractall(base_dir)

    extract_dir = os.path.join(base_dir, zf_base_dir)
    try:
        with open(os.path.jion(base_dir, md5sum_file)) as md5_fp:
            reader = csv.reader(md5_fp, delimiter=' ')
    except FileNotFoundError:
        return False, REST_ERR_400_ZIP_BADFORMAT
    else:
        for row in reader:
            if len(row) < 2:
                raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT)
            pzf = os.path.join(extract_dir, row[1])
            with open(pzf, 'rb') as fpzf:
                fdata = fpzf.read()
            md5sum = hashlib.md5(fdata).hexdigest()
            if md5sum != row[0]:
                return False, REST_ERR_400_ZIP_BADFILE
    return True, _

在上述代碼中,爲了更好的執行效率,一旦發現格式不符函數直接返回。

如今對這個函數有一個新的需求:根據函數的輸入參數,刪除壓縮包和解壓縮目錄全部文件。對於該需求,若是按照上述代碼執行,則須要在每個「return」關鍵字處對須要刪除的文件進行處理,這樣會有不少重複代碼。以往在C語言中,使用「goto」能夠很好的完成任務,很遺憾Python並不支持「goto」。

使用「try exception」控制代碼執行路徑模擬「goto」:

class PtnPackageParseError(Exception):

    """Exception for pattern package parse."""

    def __init__(self, reason, message=''):
        self.reason = reason
        self.message = message
        super().__init__()
        
        
def check_pattern_package(fpath, cleanup=False):
    """Check pattern package correctness."""
    base_dir = os.path.dirname(fpath)
    ret, reason, extract_dir = True, None, None
    try:
        with zipfile.ZipFile(fpath) as zf:
            infolist = zf.infolist()
            if not infolist[0].is_dir():
                raise PtnPackageParseError(REST_ERR_400_ZIP_BADFILE)
            zf_base_dir = infolist[0].filename
            md5sum_file = os.path.join(zf_base_dir, 'md5sum.txt')
            if md5sum_file not in zf.namelist():
                raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT)
            zf.extractall(base_dir)

        extract_dir = os.path.join(base_dir, zf_base_dir)
        try:
            with open(os.path.jion(base_dir, md5sum_file)) as md5_fp:
                reader = csv.reader(md5_fp, delimiter=' ')
        except FileNotFoundError:
            raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT)
        else:
            for row in reader:
                if len(row) < 2:
                    raise PtnPackageParseError(REST_ERR_400_ZIP_BADFORMAT)
                pzf = os.path.join(extract_dir, row[1])
                with open(pzf, 'rb') as fpzf:
                    fdata = fpzf.read()
                md5sum = hashlib.md5(fdata).hexdigest()
                if md5sum != row[0]:
                    raise PtnPackageParseError(REST_ERR_400_ZIP_BADFILE)
    except PtnPackageParseError as e:
        ret, reason = False, e.reason
    finally:
        if cleanup:
            os.unlink(fpath)
            if os.path.exists(extract_dir):
                os.removedirs(extract_dir)
    return ret, reason, extract_dir

原文出處:github: jasonTu/python-material-collection

相關文章
相關標籤/搜索