走進 Python 類的內部

時間 2020-12-01

標籤 python 安全 app 函數性能學習優化 this 設計對象欄目 Python 简体版

原文原文鏈接

這篇文章和你們一塊兒聊一聊 Python 3.8 中類和對象背後的一些概念和實現原理，主要嘗試解釋 Python 類和對象屬性的存儲，函數和方法，描述器，對象內存佔用的優化支持，以及繼承與屬性查找等相關問題。

讓咱們從一個簡單的例子開始：

class Employee:

    outsource = False

    def __init__(self, department, name):
        self.department = department
        self.name = name

    @property
    def inservice(self):
        return self.department is not None

    def __repr__(self):
        return f"<Employee: {self.department}-{self.name}>"

employee = Employee('IT', 'bobo')

employee 對象是 Employee 類的一個實例，它有兩個屬性 department 和 name，其值屬於該實例。outsource 是類屬性，全部者是類，該類的全部實例對象共享此屬性值，這跟其餘面嚮對象語言一致。

更改類變量會影響到該類的全部實例對象：

>>> e1 = Employee('IT', 'bobo')
>>> e2 = Employee('HR', 'cici')
>>> e1.outsource, e2.outsource
(False, False)
>>> Employee.outsource = True
>>> e1.outsource, e2.outsource
>>> (True, True)

這僅限於從類更改，當咱們從實例更改類變量時：

>>> e1 = Employee('IT', 'bobo')
>>> e2 = Employee('HR', 'cici')
>>> e1.outsource, e2.outsource
(False, False)
>>> e1.outsource = True
>>> e1.outsource, e2.outsource
(True, False)

是的，當你試圖從實例對象修改類變量時，Python 不會更改該類的類變量值，而是建立一個同名的實例屬性，這是很是正確且安全的。在搜索屬性值時，實例變量會優先於類變量，這將在繼承與屬性查找一節中詳細解釋。

值得特別注意的是，當類變量的類型是可變類型時，你是從實例對象中更改的它們的：

>>> class S:
...     L = [1, 2]
...
>>> s1, s2 = S(), S()
>>> s1.L, s2.L
([1, 2], [1, 2])
>>> t1.L.append(3)
>>> t1.L, s2.L
([1, 2, 3], [1, 2, 3])

好的實踐方式是應當儘可能的避免這樣的設計。
屬性的存儲

本小節咱們一塊兒來看看 Python 中的類屬性、方法及實例屬性是如何關聯存儲的。
實例屬性

在 Python 中，全部實例屬性都存儲在 __dict__ 字典中，這就是一個常規的 dict，對於實例屬性的維護便是從該字典中獲取和修改，它對開發者是徹底開放的。

>>> e = Employee('IT', 'bobo')
>>> e.__dict__
{'department': 'IT', 'name': 'bobo'}
>>> type(e.__dict__)
dict
>>> e.name is e.__dict__['name']
True
>>> e.__dict__['department'] = 'HR'
>>> e.department
'HR'

正由於實例屬性是採用字典來存儲，因此任什麼時候候咱們均可以方便的給對象添加或刪除字段：

>>> e.age = 30 # 並無定義 age 屬性
>>> e.age
30
>>> e.__dict__
{'department': 'IT', 'name': 'bobo', 'age': 30}
>>> del e.age
>>> e.__dict__
{'department': 'IT', 'name': 'd'}

咱們也能夠從字典中實例化一個對象，或者經過保存實例的 __dict__ 來恢復實例。

>>> def new_employee_from(d):
...     instance = object.__new__(Employee)
...     instance.__dict__.update(d)
...     return instance
...
>>> e1 = new_employee_from({'department': 'IT', 'name': 'bobo'})
>>> e1
<Employee: IT-bobo>
>>> state = e1.__dict__.copy()
>>> del e1
>>> e2 = new_employee_from(state)
>>> e2
>>> <Employee: IT-bobo>

由於 __dict__ 的徹底開放，因此咱們能夠向其中添加任何 hashable 的 immutable key，好比數字：

>>> e.__dict__[1] = 1
>>> e.__dict__
{'department': 'IT', 'name': 'bobo', 1: 1}

這些非字符串的字段是咱們沒法經過實例對象訪問的，爲了確保不會出現這樣的狀況，除非必要的狀況下，通常最好不要直接對 __dict__ 進行寫操做，甚至不要直接操做 __dict__。

    因此有一種說法是 Python is a 「consenting adults language」。

這種動態的實現使得咱們的代碼很是靈活，不少時候很是的便利，但這也付出了存儲和性能上的開銷。因此 Python 也提供了另一種機制(__slots__)來放棄使用 __dict__，以節約內存，提升性能，詳見 __slots__ 一節。
類屬性

一樣的，類屬性也在存儲在類的 __dict__ 字典中：

>>> Employee.__dict__
mappingproxy({'__module__': '__main__',
              'outsource': True,
              '__init__': <function __main__.Employee.__init__(self, department, name)>,
              'inservice': <property at 0x108419ea0>,
              '__repr__': <function __main__.Employee.__repr__(self)>,
              '__str__': <function __main__.Employee.__str__(self)>,
              '__dict__': <attribute '__dict__' of 'Employee' objects>,
              '__weakref__': <attribute '__weakref__' of 'Employee' objects>,
              '__doc__': None}

>>> type(Employee.__dict__)
mappingproxy

與實例字典的『開放』不一樣，類屬性使用的字典是一個 MappingProxyType 對象，它是一個不能 setattr 的字典。這意味着它對開發者是隻讀的，其目的正是爲了保證類屬性的鍵都是字符串，以簡化和加快新型類屬性的查找和 __mro__ 的搜索邏輯。

>>> Employee.__dict__['outsource'] = False
TypeError: 'mappingproxy' object does not support item assignment

由於全部的方法都歸屬於一個類，因此它們也存儲在類的字典中，從上面的例子中能夠看到已有的 __init__ 和 __repr__ 方法。咱們能夠再添加幾個來驗證：

class Employee:
    # ...
    @staticmethod
    def soo():
        pass

    @classmethod
    def coo(cls):
        pass

    def foo(self):
        pass

>>> Employee.__dict__
mappingproxy({'__module__': '__main__',
              'outsource': False,
              '__init__': <function __main__.Employee.__init__(self, department, name)>,
              '__repr__': <function __main__.Employee.__repr__(self)>,
              'inservice': <property at 0x108419ea0>,
              'soo': <staticmethod at 0x1066ce588>,
              'coo': <classmethod at 0x1066ce828>,
              'foo': <function __main__.Employee.foo(self)>,
              '__dict__': <attribute '__dict__' of 'Employee' objects>,
              '__weakref__': <attribute '__weakref__' of 'Employee' objects>,
              '__doc__': None})

繼承與屬性查找

目前爲止，咱們已經知道，全部的屬性和方法都存儲在兩個 __dict__ 字典中，如今咱們來看看 Python 是如何進行屬性查找的。

Python 3 中，全部類都隱式的繼承自 object，因此總會有一個繼承關係，並且 Python 是支持多繼承的：

>>> class A:
...     pass
...
>>> class B:
...     pass
...
>>> class C(B):
...     pass
...
>>> class D(A, C):
...     pass
...
>>> D.mro()
[<class '__main__.D'>, <class '__main__.A'>, <class '__main__.C'>, <class '__main__.B'>, <class 'object'>]

mro() 是一個特殊的方法，它返回類的線性解析順序。

屬性訪問的默認行爲是從對象的字典中獲取、設置或刪除屬性，例如對於 e.f 的查找簡單描述是:

    e.f 的查找順序會從 e.__dict__['f'] 開始，而後是 type(e).__dict__['f']，接下來依次查找 type(e) 的基類（__mro__ 順序，不包括元類）。若是找到的值是定義了某個描述器方法的對象，則 Python 可能會重載默認行爲並轉而發起調用描述器方法。這具體發生在優先級鏈的哪一個環節則要根據所定義的描述器方法及其被調用的方式來決定。

因此，要理解查找的順序，你必需要先了解描述器協議。

簡單總結，有兩種描述器類型：數據描述器和和非數據描述器。

    若是一個對象除了定義 __get__() 以外還定義了 __set__() 或 __delete__()，則它會被視爲數據描述器。僅定義了 __get__() 的描述器稱爲非數據描述器（它們一般被用於方法，但也能夠有其餘用途)

因爲函數只實現 __get__，因此它們是非數據描述器。

Python 的對象屬性查找順序以下：

    類和父類字典的數據描述器
    實例字典
    類和父類字典中的非數據描述器

請記住，不管你的類有多少個繼承級別，該類對象的實例字典老是存儲了全部的實例變量，這也是 super 的意義之一。

下面咱們嘗試用僞代碼來描述查找順序：

def get_attribute(obj, name):
    class_definition = obj.__class__

    descriptor = None
    for cls in class_definition.mro():
        if name in cls.__dict__:
            descriptor = cls.__dict__[name]
            break

    if hasattr(descriptor, '__set__'):
        return descriptor, 'data descriptor'

    if name in obj.__dict__:
        return obj.__dict__[name], 'instance attribute'

    if descriptor is not None:
        return descriptor, 'non-data descriptor'
    else:
        raise AttributeError

>>> e = Employee('IT', 'bobo')
>>> get_attribute(e, 'outsource')
(False, 'non-data descriptor')
>>> e.outsource = True
>>> get_attribute(e, 'outsource')
(True, 'instance attribute')
>>> get_attribute(e, 'name')
('bobo', 'instance attribute')
>>> get_attribute(e, 'inservice')
(<property at 0x10c966d10>, 'data descriptor')
>>> get_attribute(e, 'foo')
(<function __main__.Employee.foo(self)>, 'non-data descriptor')

因爲這樣的優先級順序，因此實例是不能重載類的數據描述器屬性的，好比 property 屬性：

>>> class Manager(Employee):
...     def __init__(self, *arg):
...         self.inservice = True
...         super().__init__(*arg)
...
>>> m = Manager("HR", "cici")
AttributeError: can't set attribute

發起描述器調用

上面講到，在查找屬性時，若是找到的值是定義了某個描述器方法的對象，則 Python 可能會重載默認行爲並轉而發起描述器方法調用。

描述器的做用就是綁定對象屬性，咱們假設 a 是一個實現了描述器協議的對象，對 e.a 發起描述器調用有如下幾種狀況：

    直接調用：用戶級的代碼直接調用e.__get__(a)，不經常使用
    實例綁定：綁定到一個實例，e.a 會被轉換爲調用: type(e).__dict__['a'].__get__(e, type(e))
    類綁定：綁定到一個類，E.a 會被轉換爲調用: E.__dict__['a'].__get__(None, E)

在繼承關係中進行綁定時，會根據以上狀況和 __mro__ 順序來發起鏈式調用。
函數與方法

咱們知道方法是屬於特定類的函數，惟一的不一樣(若是能夠算是不一樣的話)是方法的第一個參數每每是爲類或實例對象保留的，在 Python 中，咱們約定爲 cls 或 self, 固然你也能夠取任何名字如 this(只是最好不要這樣作)。

上一節咱們知道，函數實現了 __get__() 方法的對象，因此它們是非數據描述器。在 Python 訪問(調用)方法支持中正是經過調用 __get__() 將調用的函數綁定成方法的。

在純 Python 中，它的工做方式以下(示例來自描述器使用指南):

class Function:
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return types.MethodType(self, obj) # 將函數綁定爲方法

在 Python 2 中，有兩種方法: unbound method 和 bound method，在 Python 3 中只有後者。

bound method 與它們綁定的類或實例數據相關聯：

>>> Employee.coo
<bound method Employee.coo of <class '__main__.Employee'>>
>>> Employee.foo
<function __main__.Employee.foo(self)>
>>> e = Employee('IT', 'bobo')
>>> e.foo
<bound method Employee.foo of <Employee: IT-bobo>>

咱們能夠從方法來訪問實例與類：

>>> e.foo.__self__
<Employee: IT-bobo>
>>> e.foo.__self__.__class__
__main__.Employee

藉助描述符協議，咱們能夠在類的外部做用域手動綁定一個函數到方法，以訪問類或實例中的數據，我將以這個示例來解釋當你的對象訪問(調用)類字典中存儲的函數時將其綁定成方法(執行)的過程：

現有如下函數：

>>> def f1(self):
...     if isinstance(self, type):
...         return self.outsource
...     return self.name
...
>>> bound_f1 = f1.__get__(e, Employee) # or bound_f1 = f1.__get__(e)
>>> bound_f1
<bound method f1 of <Employee: IT-bobo>>
>>> bound_f1.__self__
<Employee: IT-bobo>
>>> bound_f1()
'bobo'

總結一下：當咱們調用 e.foo() 時，首先從 Employee.__dict__['foo'] 中獲得 foo 函數，在調用該函數的 foo 方法 foo.__get__(e) 將其轉換成方法，而後執行 foo() 得到結果。這就完成了 e.foo() -> f(e) 的過程。

若是你對個人解釋感到疑惑，我建議你能夠閱讀官方的描述器使用指南以進一步瞭解描述器協議，在該文的函數和方法和靜態方法和類方法一節中詳細瞭解函數綁定爲方法的過程。同時在 Python 類一文的方法對象一節中也有相關的解釋。
__slots__

Python 的對象屬性值都是採用字典存儲的，當咱們處理數成千上萬甚至更多的實例時，內存消耗多是一個問題，由於字典哈希表的實現，老是爲每一個實例建立了大量的內存。因此 Python 提供了一種 __slots__ 的方式來禁用實例使用 __dict__，以優化此問題。

經過 __slots__ 來指定屬性後，會將屬性的存儲從實例的 __dict__ 改成類的 __dict__ 中：

class Test:
    __slots__ = ('a', 'b')

    def __init__(self, a, b):
        self.a = a
        self.b = b

>>> t = Test(1, 2)
>>> t.__dict__
AttributeError: 'Test' object has no attribute '__dict__'
>>> Test.__dict__
mappingproxy({'__module__': '__main__',
              '__slots__': ('a', 'b'),
              '__init__': <function __main__.Test.__init__(self, a, b)>,
              'a': <member 'a' of 'Test' objects>,
              'b': <member 'b' of 'Test' objects>,
              '__doc__': None})

關於 __slots__ 我以前專門寫過一篇文章分享過，感興趣的同窗請移步理解 Python 類屬性 __slots__ 一文。
補充
__getattribute__ 和 __getattr__

也許你還有疑問，那函數的 __get__ 方法是怎麼被調用的呢，這中間過程是什麼樣的？

在 Python 中一切皆對象，全部對象都有一個默認的方法 __getattribute__(self, name)。

該方法會在咱們使用 . 訪問 obj 的屬性時會自動調用，爲了防止遞歸調用，它老是實現爲從基類 object 中獲取 object.__getattribute__(self, name), 該方法大部分狀況下會默認從 self 的 __dict__ 字典中查找 name(除了特殊方法的查找)。

    話外：若是該類還實現了 __getattr__，則只有 __getattribute__ 顯式地調用或是引起了 AttributeError 異常後纔會被調用。__getattr__ 由開發者本身實現，應當返回屬性值或引起 AttributeError 異常。

而描述器正是由 __getattribute__() 方法調用，其大體邏輯爲：

def __getattribute__(self, key):
    v = object.__getattribute__(self, key)
    if hasattr(v, '__get__'):
        return v.__get__(self)
    return v

    請注意：重寫 __getattribute__() 會阻止描述器的自動調用。

函數屬性

函數也是 Python function 對象，因此同樣，它也具備任意屬性，這有時候是有用的，好比實現一個簡單的函數調用跟蹤裝飾器：

def calltracker(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        wrapper.calls += 1
        return func(*args, **kwargs)
    wrapper.calls = 0
    return wrapper

@calltracker
def f():
    return 'f called'

[點擊並拖拽以移動]

>>> f.calls
0
>>> f()
'f called'
>>> f.calls
1

想學習更多關於python的知識能夠加我QQ:2955637827 python

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。