python 高級部分精華--那些書本不會告訴你的坑

時間 2019-12-10

原文原文鏈接

遞歸目錄生成器方式， else 裏的 tmp 顯示獲取 yield 不可缺乏，遞歸算法中若要使用生成器，須要在生成器的原函數（首次調用）顯式獲得全部yield值

def get_file_recur(path):
    children = os.listdir(path)
    for child in children:
        qualified_child = os.path.join(path,child)
        if os.path.isfile(qualified_child):
            yield qualified_child
        else:
            tmp = get_file_recur(qualified_child)
            for item in tmp:
                yield item

for file in get_file_recur('/home/xxx/xxx/Frank Li'):
    print(file)


>>> import os
>>> def recur_dir(path):
...     children = os.listdir(path)
...     for child in children:
...         qualified_child = os.path.join(path,child)
...         if os.path.isdir(qualified_child):
...             tmp = recur_dir(qualified_child)
...             for t in tmp:
...                 yield t
...         else:
...             yield qualified_child
...
>>> for file in recur_dir('./'):
...     print(file)

參考資料來源,以下 flattern list

def flattern(lst):
    for item in lst:
        if isinstance(item,list):
            inner_list = flattern(item)
            for i in inner_list:
                yield i
        else:
            yield item

l=[1,2,3,4,5,[6],[7,8,[9,[10]]]]
lst=flattern(l)
print(list(lst))

x = get('key','default value')[0]  or 0

體驗一波不同的協程，對比本來的 fib

import random
import time

def lazy_fib(n):
    a = 0
    b =1
    i = 0
    while i<n:
        sleep_cnt = yield b
        print('uh...let me think {0} seconds...'.format(sleep_cnt))
        time.sleep(sleep_cnt)
        a, b = b, a+b
        i+=1
        
print('-'*10 + 'test yield send' + '-'*10)

N = 20
lazyFib = lazy_fib(N)
fib_res = next(lazyFib)
while True:
    print(fib_res)
    try:
        fib_res = lazyFib.send(random.uniform(0,0.5))
    except StopIteration:
        break

import random
import time

def stupid_fib(n):
    a, b = 0, 1
    i = 0
    while i<n:
        sleep_cnt = yield a
        print('sleep {:.3f} secs'.format(sleep_cnt))
        time.sleep(sleep_cnt)
        a, b = b, a+b
        i+=1
    
def copy_fib(n):
    print('I am copy from stupid fib')
    yield from stupid_fib(n)
    print('Copy end')
    
print('-'*10+'test yield from and send'+'-'*10)
N = 20
cp_fib = copy_fib(N)
fib_res = next(cp_fib)
while True:
    print(fib_res)
    try:
        fib_res = cp_fib.send(random.uniform(0,0.5))
    except StopIteration:
        break

仔細品味

>>> def set_priority(data,group):
...     found = False
...     def helper(x):
...         nonlocal found
...         if x in group:
...             found = True
...             return (0,x)
...         return (1,x)
...     data.sort(key=helper)
...     return found
...
>>> data = [8,3,1,2,5,4,7,6]
>>> group = {2,3,5,7}
>>> set_priority(data,group)
True
>>> print(data)
[2, 3, 5, 7, 1, 4, 6, 8]

# 改進版
>>> class Sorter(object):
...     def __init__(self,group):
...         self.found = False
...         self.group = group
...     def __call__(self,x):
...         if x in self.group:
...             self.found = True
...             return (0,x)
...         return (1,x)
...
>>> data = [8,3,1,2,5,4,7,6]
>>> group = {2,3,5,7}
>>> sorter = Sorter(group)
>>> data.sort(key=sorter)
>>> data
[2, 3, 5, 7, 1, 4, 6, 8]
>>> assert sorter.found is True

x = get('key','default value')[0]  or 0

sys.argv與optparse與argparse與getopt的區別

optparse與argparse的區別：
Deprecated since version 3.2: The optparse module is deprecated and will not be developed further; development will continue with the argparse module.

Deprecated since version 2.7: The optparse module is deprecated and will not be developed further; development will continue with the argparse module.

argparse與sys.argv的區別：
The argparse module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.

argparse 與getopt的區別：
The getopt module is a parser for command line options whose API is designed to be familiar to users of the C getopt() function. Users who are unfamiliar with the C getopt() function or who would like to write less code and get better help and error messages should consider using the argparse module instead.

多多學習

from collections import Iterator, Iterable
from collections import defaultdict
from collections import Counter, ChainMap, OrderedDict, namedtuple, deque
from itertools import islice  #  替代 切片，可是隻能 是正數
from itertools import zip_longest # 替代 zip 能夠 對不同個數的 進行迭代

from concurrent.futures import ThreadPoolExecutor as Pool

python 利用生成器讀取大文件

def read_block(file_path,block_size=1024*1024): 
    fr = open(file_path)
    while True:
        read_blk = fr.read(block_size)
        if not read_blk:
            break
        yield read_blk
        
for blk in read_block('./test.py'):
    print(blk)

python 利用 with 讀取大文件，使用 rb 時候效率最高

with open('./test.py','r') as f:
    for line in f:
        print(line)

讓 json 格式看起來更好

with open('./param.json','w',encoding='utf-8') as f:
        f.write(json.dumps(json.loads(json_str),ensure_ascii=False,indent=4))

如下轉自簡書---

這周聽了三節Python進階課程，有十幾年的老程序給你講課傳授一門語言的進階知識，也許這是在大公司才能享受到的福利。雖然接觸使用Python也有三四年時間了，可是從課程中仍是學習到很多東西，掌握了新技巧的用法，明白了老知識背後的緣由。
下載了課件，作了筆記，但我仍是但願用講述的方式把它們表現出來，爲將來的本身，也給須要的讀者。總體以大雄的課程爲藍本，結合我在開發中的一些本身的體會和想法。python

1. 寫操做對於命名空間的影響

首先來看這樣一段代碼：算法

import math

def foo(processed):
    value = math.pi

    # The other programmer add logic here.
    if processed:
        import math
        value = math.sin(value)
    
    print value
    
foo(True)

思考：你以爲這段代碼有沒有什麼問題，它的運行結果是什麼？編程

首先，我我的不喜歡在代碼中進行import math的操做的方式，一般會建議把這一操做放置到文件頭部，這主要處於性能的考慮——雖然已經import過的模塊不會重複執行加載過程，但畢竟有一次從sys.modules中查詢的過程。這種操做在tick等高頻執行的邏輯中尤爲要去避免。json

但這並非這段代碼的問題所在的重點，當你嘗試執行這段代碼的時候，會輸出以下的錯誤：設計模式

Traceback (most recent call last):
File "C:\Users\David-PC\Desktop\Advanced Course on Python 2016\t019.py", line 13, in
foo(True)
File "C:\Users\David-PC\Desktop\Advanced Course on Python 2016\t019.py", line 4, in foo
value = math.pi
UnboundLocalError: local variable 'math' referenced before assignment
在賦值以前被引用了，這彷佛是在文件頭部進行import的鍋。這個例子稍微有點複雜，咱們嘗試寫一段有點近似可是更簡單的例子，在以前編碼過程當中我就遇到過相似的狀況：多線程

value = 0
def foo():
    if value > 0:
        value = 1
        print value
foo()

一樣會提示value在被賦值以前被使用了，讓這段代碼正常運做很簡單，只須要把global value放在foo函數定義的第一行就能夠了。閉包

思考: 爲何在foo函數內部，沒法訪問其外部的value變量？app

若是你把value = 1這一行代碼註釋掉，這段代碼就能夠正常運行，看上去對於value的賦值操做致使了咱們沒法正常訪問一個外部的變量，不管這個賦值操做在訪問操做以前仍是以後。less

Write operation will shield the locating outside the current name space, which is determined at compile time.dom

簡單來講，命名空間內部若是有對變量的寫操做，這個變量在這個命名空間中就會被認爲是local的，你的代碼就不能在賦值以前使用它，並且檢查過程是在編譯的時候。使用global關鍵字能夠改變這一行爲。
那咱們回到第一段代碼，爲何imort的一個模塊也沒法正常被使用呢？
若是理解import的過程，答案就很簡單了——import其實就是一個賦值的過程。

總結：以前我自認爲Python的命名空間很容易理解，對於全局變量或者說upvalue的訪問卻一般不去注意，有時候以爲不須要寫global來標識也能夠訪問獲得，有時候又會遇到語法錯誤的提示，其實一直沒有理解清楚是什麼規則致使這樣的結果。
寫操做對於命名空間的影響解答了這一問題，讓我看到本身以前「面對出錯提示編程」的愚蠢和懶惰。。。

2. 循環引用

Python的垃圾回收（GC）結合了引用計數（Reference Count）、對象池（Object Pool）、標記清除(Mark and Sweep)、分代回收（Generational Collecting）這幾種技術，具體的GC實現放在後面來講，咱們先看代碼中存在循環引用的狀況。
遊戲開發中設計出循環引用很是地簡單，好比遊戲中經常使用的實體（Entity）結構：

class EntityManager(object):
    def __init__():
        self.__entities = {}

    def add_entity(eid):
        #Some process code.
        self.__entities[eid] = Entity(id, self)

    def get_entity(eid):
        return self.__entities.get(eid, None)

class Entity(object):
    def __init__(eid, mgr):
        self.eid = _id
        self.mgr = mgr

    def attact(skill_id, target_id):
        target = self.mgr.get_entity(target_id)
        #attack the target
        #...

很明顯，這裏EntityManager中的__entities屬性引用了它所控制的全部對象，而對於一個遊戲實體，有時候須要可以獲取別的實體對象，那麼最簡單的方法就是把EntityManager的本身傳遞給建立出來的實體，讓其保留一個引用，這樣在執行攻擊這樣的函數的時候，就能夠很方便地獲取到想要拿到的數據。
EntityManager中的__entities屬性引用了Entity對象，Entity對象身上的mgr屬性又引用了EntityManager對象，這就存在循環引用。
有的人也許會說，有循環引用了，so what? 首先我能夠從邏輯上保證釋放的時候都會把環解開，這樣就能夠正常釋放內存了。再者，自己Python本身就提供了垃圾回收的方式，它能夠幫我清理。
對於這種想法，做爲一個遊戲開發者，我表示——呵呵
咱們看一個在遊戲開發中常見的循環引用的例子，有些狀況下寫了循環引用而不自知（實例代碼直接使用大雄課程中的）。

class Animation(object):
    def __init__(self, callback):
        self._callback = callback
        
class Entity(object):
    def __init__(self):
        self._animation = Animation(self._complete)
        
    def _complete(self):
        pass
        
e = Entity()
print e._animation._callback.im_self is e

最終print輸出的結果是True，也解釋了這段邏輯中的循環引用所在。
對於多人協做來實現的大型項目來講，邏輯上保證代碼中沒有環存在是幾乎不可能的事情，何況即便你代碼邏輯上能夠正確釋放，偶發的traceback就可能讓你接環的邏輯沒有被執行到，從而致使了循環引用對象的沒法當即釋放。

Python的循環引用處理，若是一個對象的引用計數爲0的時候，該對象會當即被釋放掉。

而後Python的GC是很耗的一個過程，會形成CPU瞬間的峯值等問題，網易有項目就徹底本身實現了一套分片多線程的GC機制來替換掉Python原生的GC。
大量循環引用的存在會致使更慢更加頻繁的GC，也會致使內存的波動。

解決方法：對於EntityManager的例子，使用weakref來解決；對於callback的例子，儘可能避免使用對象的方法來做爲一個回調。

self._animation = Animation(lambda obj = weakref.proxy(self): obj._complete())

總結：對於簡單的系統來講，不須要關心循環引用的問題，交給Python的GC就夠了，可是須要長時間運行，對於CPU波動敏感的系統，須要關注循環引用的影響，儘可能去規避。

題外話：在咱們如今的項目中，EntityManager的例子使用了單例模式來解除循環引用，這是一種經常使用的方法，可是單例模式也不是「銀彈」。這種設計模式在限制對象實例化的同時，也提供了全局訪問的接口，意味着這個單例對象變成了一個全局對象，因而代碼中充滿了不考慮耦合性的濫用。在客戶端代碼中，這些使用全局單例的邏輯沒有問題，由於客戶端只須要一個EntityManager就能夠管理全部的遊戲實體，也不會存在其餘的並行環境，而當咱們須要進行服務端開發的時候，同一份代碼拿到服務端就變成了災難——對於服務端來講，可能會存在不少EntityManager管理不一樣情境下的遊戲實體，單例的模式再也不可用，以前任意訪問EntityManager的地方都須要通過迭代和整理才能夠正常執行。

閉包與引用傳遞的坑

這一部分是關於Python的Callable。在Stackoverflow上有一個專門的問題叫作「What is a "callable" in Python」，高票回答中說：

A callable is anything that can be called.

這個回答很抽象，大雄從更具體的角度來闡述Callable這個概念——在Python中哪些是callable的？

function
closure
bound method
unbound method
class method
static method
functor
operator
class
先說答案，很明顯，列出的這些都是callable的。這些概念中的大部分我在工做中都有使用，包括好比closure的坑也幫助新同窗調試bug的時候看到新入職的同窗本身踩到過，可是對於bound method和unbound method這些概念還不是很清晰。咱們也一個個來看。

3. Closure
Closure，閉包，在Python中本質上是一個函數，或者更具體來講它和Function的區別是它包含了Code和Environment，而Python中Environment又能夠分爲globals、locals和cells三部分。
globals和locals比較容易理解，其實就是兩個dict，分別保存了全局變量和局部變量，那這個cells是什麼？咱們先來看一個很是經典的例子：

def foo():
    logout_lst = []

    for i in xrange(5):
        def logout():
            print i
        logout_lst.append(logout)

    for l in logout_lst:
        l()

foo()
思考：這段代碼的輸出是什麼？

分析一下這段代碼，雖然這裏爲了方便演示，構造了一個只有print的邏輯，你可能會質疑它的做用，可是在咱們開發的過程當中，就有同窗在循環內部定義了相似的閉包用於引擎回調的調用，引用了外部了一個相似i的變量。例子中，在foo的函數內部，代碼def logout()定義了一個閉包（寫到這裏讓我想起了遙遠的過去寫JAVA代碼時使用的Inner Class），而後咱們想使用外部變量i的值，這裏只是把它輸出出來，一般咱們想要輸出的結果是打印0、一、二、三、4這幾個數字，固然中間有換行，可是最終的輸出結果是什麼呢？
5個4！
爲何呢？咱們來添加一些輸出日誌來查看一下，爲了方便看輸出，咱們只循環兩次來看，修改後的代碼以下：

def foo():
    logout_lst = []

    for i in xrange(2):
        def logout():
            print "i:", i, id(i)
            print "globals:", globals()
            print "locals:", locals()
        logout_lst.append(logout)

    for l in logout_lst:
        l()
        print "Cells:", l.__closure__, id(l.__closure__[0].cell_contents)
        print ''

foo()
輸出的結果以下：

i: 1 35882616
globals: {'__builtins__': <module '__builtin__' (built-in)>, '__file__': 'F:\\David\\narrator.py', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x022C72B0>, '__doc__': None}
locals: {'i': 1}
Cells: (<cell at 0x02354570: int object at 0x02238678>,) 35882616

i: 1 35882616
globals: {'__builtins__': <module '__builtin__' (built-in)>, '__file__': 'F:\\David\\narrator.py', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x022C72B0>, '__doc__': None}
locals: {'i': 1}
Cells: (<cell at 0x02354570: int object at 0x02238678>,) 35882616
首先打印一下i的值與i這個變量的id，你能夠認爲這是i在Python虛擬機中的惟一編號，兩次輸出它的值都是1，id也都是一個35882616，而後輸出一下globals和locals看一下，這兩個很簡單，不作分析了。最後經過__closure屬性來看下閉包的內容：

Cells: (<cell at 0x02354570: int object at 0x02238678>,)
這就是前面說的cells，它是一個cell對象，裏面的內容有一個int對象，經過cell_contents屬性能夠查看到它的id是35882616，和i是同樣的。
能夠看出，cells就是對於up-values的引用（references），注意，引用！
那以前的輸出就很容易理解了，引用，當後面調用閉包執行的時候，i變量值已經變成了4，那輸出i天然每次都是4。
最後，如何修改可讓你的代碼能夠按照以前的計劃正常執行呢？很簡單，不要直接使用cells中的值，而是用一個參數來讓它變成參數，就是定義這個閉包的時刻的值了。

def foo():
    logout_lst = []

    for i in xrange(2):
        def logout(x = i):
            print "x:", x, id(x)
            print "globals:", globals()
            print "locals:", locals()
        logout_lst.append(logout)

    for l in logout_lst:
        l()
        print "Cells:", l.__closure__
        print ''

foo()
輸出結果：

x: 0 37062276
globals: {'__builtins__': <module '__builtin__' (built-in)>, '__file__': 'F:\\David\\narrator.py', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x023E72B0>, '__doc__': None}
locals: {'x': 0}
Cells: None

x: 1 37062264
globals: {'__builtins__': <module '__builtin__' (built-in)>, '__file__': 'F:\\David\\narrator.py', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x023E72B0>, '__doc__': None}
locals: {'x': 1}
Cells: None
此處，cells的內容變爲了None，輸出的結果也是0和1，它們的id天然也不一樣。其實參數也能夠寫成def logout(i = i):，內部可使用i，可是這會形成一些困擾，我的不推薦這麼寫。

思考：那麼你覺得這個坑就踩完了嗎？有沒有哪裏還可能存在問題？

def logout(x = i):這種定義雖然用在閉包裏，可是實際上是函數的默認參數，那麼默認參數若是使用list、dict或者python object等這樣mutable的值會怎樣？這天然是另一個入門級的坑：

背景： 不建議在函數默認參數中使用mutable value，而保證只使用immutable value。

但有時候爲了解決一個坑，可能不當心踩入另一個坑。若是這裏使用了，好比一個list對象做爲參數，那麼建立出來的這幾個閉包中的x都引用的會是同一個對象，並且，在任何一個閉包屢次調用的時候，x的值都是同一個對象的引用。若是像例子中是隻讀的邏輯的話，可能沒有問題，若是後面有人添加了修改的邏輯，那就呵呵呵呵了。可能會亂成一鍋粥，出現各類神奇的現象，寫這樣邏輯的人自求多福吧。

總結：理解閉包的概念，理解引用的概念，編寫代碼保持思路清晰，明確本身使用的變量存在在哪裏，是一件很是很是重要的事情，對團隊開發中避免匪夷所思使人抓狂的Bug頗有幫助！

這一部分只講閉包這一個點，其實關於閉包還有不少知識點，有興趣的能夠本身查閱相關資料。第三部分講解bound method和unbound method，這是我此次課程最喜歡的部分。

PS: 不少坑，你看過文章介紹，或者聽同事講過，可是寫代碼的時候有時仍是會因爲當時思路的混亂而饒進去，從新踩一遍，這每每難以免，不親身經歷的坑思惟上很難那麼敏感。經驗學習和知識積累的做用，是讓你從坑中往外爬的時候更快一些，回頭看那些坑印象更深入一些。

做者：董夕
連接：https://www.jianshu.com/p/39460eff2d9d
來源：簡書
簡書著做權歸做者全部，任何形式的轉載都請聯繫做者得到受權並註明出處。