Python演講筆記1

時間 2019-11-13

標籤 python 演講筆記欄目 Python 简体版

原文原文鏈接

參考：

1. The Clean Architecture in Python (Brandon Rhodes)python

2. Python Best Practice Patterns (Vladimir Keleshev)編程

3. Transforming Code into Beautiful, Idiomatic Python (Raymond Hettinger)json

4. How to Write Resuable Code (Greg Ward)設計模式

5. How to write actually object-oriented python (Per Fagrell)api

最近看了一些 Python 的演講，以爲頗有啓發。安全

1. The Clean Architecture in Python (Brandon Rhodes)

咱們習慣上用子程序來隱藏複雜的 IO，而不是真正的與邏輯進行解耦，因此就不如把 IO 從程序的底層提高到頂層。

Listing 1，訪問 API，嘗試獲取 Definition 字段信息並返回數據結構

# Listing 1

import requests
from urllib import urlencode

def find_definition(word):
    q = 'define' + word
    url = 'http://api.duckduckgo.com/?'
    url += urlencode({'q': q, 'format': 'json'})
    response = requests.get(url)    # I/O
    data = response.json()          # I/O
    definition = data[u'Definition']
    if definition == u'':
        raise ValueError('that is not a word')
    return definition

Listing 2，意識到 IO 應該與邏輯分離，因而有了 call_json_api，表面上，IO 被隱藏了，可是，IO 並無與邏輯分離。socket

如今想測試 find_definition ，有可能繞過 IO 麼？沒可能，IO 與邏輯仍然緊密耦合。ide

再看看 find_definition 究竟作了什麼，構建url，IO，判斷，依此解耦。函數式編程

# Listing 2

def find_definition(word):
    q = 'define' + word
    url = 'http://api.duckduckgo.com/?'
    url += urlencode({'q': q, 'format': 'json'})
    data = call_json_api(url)
    definition = data[u'Definition']
    if definition == u'':
        raise ValueError('that is not a word')
    return definition

def call_json_api(url):
    response = requests.get(url)
    data = response.json()
    return data

Listing 3，代碼沒有變化，可是進行了新的組合，構建 url 和判斷被拆分出來，獨立於 IO。

在這裏，我認爲 IO 維持 call_json_api 也能夠，可是可能做者爲了突出把 IO 由程序底層提高至最上層。

關鍵在於，build_url 和 pluck_definition 與 IO 徹底解耦，能夠隨意測試它們，而且它們屬於 fast function。

若是要測試 find_definition，那麼將比 Listing 1 和 2 的版本更容易。

# Listing 3

def find_definition(word):
    url = build_url(word)
    data = requests.get(url).json()    # I/O
    return pluck_definition(data)

def build_url(word):
    q = 'define ' + word
    url = 'http://api.duckduckgo.com/?'
    url += urlencode({'q': q, 'format': 'json'})
    return url

def pluck_definition(data):
    definition = data[u'Definition']
    if definition == u'':
        raise ValueError('that is not a word')
    return definition

把沒有反作用的函數稱爲純函數，純函數更容易測試。依賴注入和猴子補丁，是在對錯誤的程序結構進行彌補，而 Python 能夠儘可能避免。

build_url 和 pluck_definition 屬於純函數，而 Listing 1 和 Listing 2 中的 find_definition，在 Python 要依賴猴子補丁進行測試了。

def test_build_url():
    assert build_url('word') == (
        'http://api.duckduckgo.com/'
        '?q=define+word&format=json'
    )

def test_build_url_with_punctuation():
    assert build_url('what?!') == (
        'http://api.duckduckgo.com/'
        '?q=define+what%3F%21&format=json'
    )
    
def test_build_url_with_hyphen():
    assert build_url('hyphen-ate') == (
        'http://api.duckduckgo.com/'
        '?q=define+hyphen-ate&format=json'
    )

函數式編程的最大優點可能不是不可變數據結構，而是它們是在處理咱們能夠想象到的數據，並用 Shell 命令舉例。

再來看兩個版本的 find_definition, 對比 Listing 1，Listing 3 的 find_definition 明顯更清晰，清晰在哪裏？

word -> url -> data 這都是真實的數據，就像是 Shell 中的管道同樣，數據從一個管道流向下一個管道，而且咱們能清晰的想象到，這個數據每一步的形態。

＃Listing 1
def find_definition(word):
    q = 'define' + word
    url = 'http://api.duckduckgo.com/?'
    url += urlencode({'q': q, 'format': 'json'})
    response = requests.get(url)    # I/O
    data = response.json()          # I/O
    definition = data[u'Definition']
    if definition == u'':
        raise ValueError('that is not a word')
    return definition

＃Listing 3
def find_definition(word):
    url = build_url(word)
    data = requests.get(url).json()    # I/O
    return pluck_definition(data)

總結

演講的題目是 The Clean Architecture in Python，而讓 architecture 不 clean 的是由於 IO 操做，IO 操做處理起來每每麻煩，因此將其單獨包裝放到子程序中，看起來像是解決了問題，可是視而不見的策略下 IO 和邏輯仍是強耦和的，沒有辦法脫離 IO 對某一部分邏輯進行單獨測試，爲了解決這個問題，靜態語言引入了依賴注入，動態語言使用猴子補丁，但都不如從一開始就正視 IO，實現 IO 與邏輯的解耦；只包含邏輯的函數稱爲純函數，數據在純函數中流動，與 Shell 的管道類似，每一步都有具體的數據表現形式，這些就構成了 clean architecture。

2. Python Best Practice Patterns (Vladimir Keleshev)

每一個函數有着一個肯定的功能，函數內的操做都應該處於同一層次抽象上，依此原則，程序必然表現爲衆多小函數的集合，每一個函數可能只有幾行代碼。

鍋爐管理安全檢測，在溫度和睦壓到達臨界後自動停機，停機失敗觸發報警。

safety_check 包括溫度壓力讀取和計算，臨界判斷，關機，報警，它們屬於同一層次，可是它們內部的邏輯不是，依此解耦。

class Boiler(object):
    # ...
    def safety_check(self):
        # Convert fixed-point floating-point:
        temperature = self.modbus.read_holding()
        perssure_psi = self.abb_f100.register / F100_FACTOR
        if psi_to_pascal(pressure_psi) > MAX_PRESSURE:
            if temperature > MAX_TEMPERATURE:
                # Shutdown!
                self.pnoz.relay[15] &= MASK_POWER_COIL
                self.pnoz.port.write("$PL,15\0")
                sleep(RELAY_RESPONSE_DELAY)
                # Successfull shutdown?
                if self.pnoz.relay[16] & MASK_POWER_OFF:
                    # Play alarm:
                    with open(BUZZER_MP3_FILE) as f:
                        play_sound(f.read())

值得一提的是 @property 和 all 的使用，all 以外還有 any

class Boiler(object):
    # ...
    def alarm(self):
        with open(BUZZER_MP3_FILE) as f:
            play_sound(f.read())

    def shutdown(self):
        self.pnoz.relay[15] &= MASK_POWER_COIL
        self.pnoz.port.write("$PL,15\0")
        sleep(RELAY_RESPONSE_DELAY)
        return not (self.ponz.relay[16] & MASK_POWER_OFF)

    def safety_check(self):
        if all((self.pressure > MAX_PRESSURE,
               temperature > MAX_TEMPERATURE)):
                if not self.shutdown():
                    self.alarm()

    @property
    def temperature(self):
        return self.modbus.read_holding()

    @property
    def pressure(self):
        perssure_psi = self.abb_f100.register / F100_FACTOR
        return psi_to_pascal(perssure_psi)

Python類初始化須要的全部參數都應該傳遞給初始化函數。

# wrong
point = Point()
point.x = 12
point.y = 5

# better
point = Point(x=12, y=5)
point = Point.polar(r=13, theta=22.6)

class Point(object):
    def __init__(self, x, y):
        self.x, self.y = x, y
    
    @classmethod
    def polar(cls, r, theta):
        return cls(r * cos(theta),
                   r * sin(theta))

一個函數須要不少參數，而且內部有不少臨時變量，如何優化？

面對一個複雜的任務，後面的代碼依賴 processed，copied，executed 這些臨時變量，而臨時變量依賴 task， job， obligation 這些參數。

假設 send_task 能夠解耦爲 prepare, process, execute

def send_task(task, job, obligation):
    ...
    processed = ...
    ...
    copied = ...
    ...
    executed = ...
    ...
    100 more lines

第一次解耦，把生成 processed，copied，executed 的準備工做提出來，並無很大的改善，而且若是 process 和 execute 也依賴 task， job， obligation

函數的參數過多就會變成一個問題

def prepare(task, job, obligation):
    ...
    return processed, copied, executed

def process(processed, copied, executed)
    ...
    return processed, copied, executed
    
def execute(processed, copied, executed)
    ...

def send_task(task, job, obligation):
    execute(*process(*prepare(task, job, obligation)))

若是一些函數共享一些數據，那麼這就應該是個類，由於類自己就是數據和函數的集合。

class TaskSender(object):
    def __init__(self, task, job, obligation):
        self.task = task
        self.job = job
        self.obligation = obligation
        self.processed = []
        self.copied = []
        self.executed = []

    def __call__(self):
        self.prepare()
        self.process()
        self.execute()
    
    ...

一些動做應該確保一塊兒進行，如何處理？

使用 Context Manager

# not good
f = open('file.txt', 'w')
f.write('hi')
f.close()

# better
with open('file.txt', 'w') as f:
    f.write('hi')

with SomeProtocol(host, port) as protocol:
    protocol.send(['get', signal])
    result = protocol.receive()

class SomeProtocol(object):
    def __init__(self, host, port):
        self.host, self.port = host, port

    def __enter__(self):
        self._client = socket()
        self._client.connect((self.host, self.port))
    
        return self
    
    def __exit__(self, exception, value, traceback):
        self._client.close()
        
    def send(self, payload): ...
    
    def receive(self): ...

str 和 repr

debug 時能夠直接print實例而不是使用屬性初始化字符串

# default
>>> Point(12, 5)
<__main__.Point instance at 0x100b4a758>

# __repr__
>>> Point(12, 5)
Point(x=12, y=5)

# __str__
>>> print(Point(12, 5))
(12, 5)

class Point(object):
    ...
    def __str__(self):
        return '({x}, {y})'.format(self.x, self.y)
    
    def __repr__(self):
        return '{}(x={}, y={})'.format(self.__class__.__name__,
                                       self.x, self.y)

註釋是本應該出如今代碼中卻丟失的信息，在這個角度上講註釋和bug無異

引自演講1中提到的極限編程的一個觀點，在這裏也是講一部分 comment 能夠更好的呈如今程序中

# not good
if self.flags & 0b1000:    # Am I visible?
    ...

# better
@property
def is_visible(self):
    return self.flags & 0b1000

if self.is_visible:
    ...

# Tell my station to process me
self.station.process(self)

省去沒必要要的判斷

if else 在某種程度上能夠由良好的設計替代

# normal
if type(entry) is Film:
    responsible = entry.producer
else:
    responsible = entry.author
    
# better
class Film(object):
    ...
    @property
    def responsible(self):
        return self.producer
    
entry.responsible

類變量

能使用類變量的地方，儘可能不使用全局變量，類變量在實例方法中經過 Classname.variable 使用

迭代器

經過 __iter__ 實現迭代器，還不能徹底理解迭代器的好處，可是感受上使用迭代器要更好一些，不只僅是代碼整潔，Department 的屬性更少的暴露可能也是好處。

# normal
class Department(object):
    def __init__(self, *employees):
        self.employees = employees
        
for employee in department.employees:
    ...


# use __iter__
class Department(object):
    def __init__(self, *employees):
        self._employees = employees

    def __iter__(self):
        return iter(self._employees)

for employee in department:
    ...

Set and Concatenating Streams

set 的使用和神庫 itertools

item in a_set
item not in a_set

a_set <= other
a_set.is_subset(other)

a_set | other
a_set.union(other)

a_set & other
a_set.intersection(other)

a_set - other
a_set.difference(other)


# not good
for each in big_list + another_big_list:
    ...

# better
for each in itertools.chain(big_list,
                            another_big_list)
    ...

總結：

演講1提到的 IO 與邏輯分離，在這裏給出了更普適的實踐方法，每一個函數都應該有肯定的功能，函數內的代碼應該在同一抽象層次上，依此進行解耦。又提到對於一個大型的方法，方法中代碼共享多個參數和臨時變量，解耦後函數參數過多，這時候就應該使用類，類是數據和方法的集合。這兩點其實是回答了代碼應該怎麼組織的問題，剩下的就是一些技巧，好比上下文管理器，迭代器，set，itertools等。還值得一提的是，註釋是本應該出現程序中卻沒有出現的 bug，這不是在宣揚不寫註釋，而是講代碼應該更明確。

3. Transforming Code into Beautiful, Idiomatic Python (Raymond Hettinger)

reversed, sorted, enumerate, izip, iter, partial, .iteritems(), dict(izip(list1, list2)), dict(enumerate(list1)), defaultdict, .setdefault, popitem is atomic, namedtuple,

deque

4. How to Write Resuable Code (Greg Ward)

OOP was not a silver bullet

面向對象，函數式，協程，都是在解決特定的問題，沒有銀彈。

OOP是衆多方法中的一種，崇拜和排斥都不是好的態度。

Fewer Classes More Functions

函數能優雅實現的就不要用類。

若是不少函數共享一些變量，那就是一個類，與演講2一致。

Functions ≠ Procedures

Pascal's best idea: functions compute stuff, procedures do stuff

rule of thumb: every function should either return a value or have a side effect: never both!

一個函數應該要麼有一個反作用，要麼返回一個值，可是毫不能既有反作用又返回一個值。

提問中有人問到，若是一個函數是執行了一個反作用而後返回布爾值，按照這條規則就只能生成一個異常，可是不少狀況下這又不像是異常，該如何處理？

演講者對此的解釋是，這條規則也不是銀彈，具體的選擇仍是取決於應用場景。

Extensibility ≠ Reusability

僅僅一個 class Foo 並不能讓代碼可擴展可複用。

不要太執着於可擴展性，這極可能只是個故事。

總結：

Python的誘惑不少，函數式，面向對象，設計模式，動態語言不須要設計模式……切記，沒有銀彈，沒有免費的午飯。

一個函數應該在反作用和返回值中間二選其一，共享變量的函數就該組合成一個類，進一步補充了演講1和演講2中的觀點。

隱含着還給出了第二個原則，不要太糾結於可擴展。

5. How to write actually object-oriented python (Per Fagrell)

Single Responsibility Principle 單一職責原則

Code should have one and only one reason to change.

管理鏈接和接收數據屬於不一樣的職責，須要對其拆分。

# not good
class Modem(object):
    def call(self, number):
        pass

    def disconnect(self):
        pass

    def send_data(self, data):
        pass

    def recv_data(self):
        pass

# better
class ConnectionManage(object):
    def call(self, number):
        pass

    def disconnect(self):
        pass

class DataTransciever(object):
    def send_data(self, data):
        pass

    def recv_data(self):
        pass

業務邏輯和數據持久化也須要進行拆分。

class Person(object):
    def calculate_pay(self):
        ...

    def save(self):
        ...


class Person(object):
    def calculate_pay(self):
        ...

class DbPersistMixin(object):
    def save(self):
        ...

Open/Closed Principle 開閉原則

Code should open to extension but close to modification.

# normal
def validate_link(self, links):
    for link in links:
        track = Track(link)
        self.validate(track)


# when modify
def validate_link(self, links):
    for link in links:
        if link.startwith("spotify:album:"):
            uri = Album(link)
        else:
            uri = Track(link)
        self.validate(uri)

# better
def validate_link(self, links):
    for link in links:
        self.validate(uri_factory(link))

Liskov Substitutability Principle 里氏替換原則

Anywhere you use a base class, you should be able to use a subclass and not know it.

Python duck typing

Interface Segregation Principle 接口隔離原則

Don't force clients to use interfaces they don't need.

Dependency Inversion Principle 依賴倒置原則

High-level modules shouldn't relay on low-level modules. Both should relay on abstractions.

以上五原則就是SOLID

Tell, Don't Ask

Tell objects to do the work, don't ask them for their data.

能夠理解意思，可是演講者的例子有些牽強，除非 calculate 不是爲了計算 cost，那麼更細的拆分是有意義的。

def calculate(self):
    cost = 0
    for line_item in self.bill.items:
        cost += line_item.cost


def calculate(self):
    cost = self.bill.total_cost()
    ...

總結：

主要介紹了OOP SOLID設計原則，着重介紹了SRP，其實是 IO 和邏輯分離，函數內部代碼同層次抽象的思想延伸到了類，不一樣的是，類強調單一職責，職責的範圍概念上比函數的功能大了一點，另外在講到應用邏輯與持久化分離的時候，實際上持久化部分使用了多重繼承，Python中是典型的鑽石繼承，不過鑽石繼承要注意super的使用，而且有一種觀點是鑽石繼承其實不該該存在，能夠經過組合的方式來解決。最後一個做者本身添加的原則，實際上與人開車，可是drive方法是在車類中意思相近。