【第1題】 Pythonn內存管理以及垃圾回收機制

時間 2019-11-06

標籤第1題 pythonn 內存管理以及垃圾回收機制简体版

原文原文鏈接

內存管理

Python解釋器由c語言開發完成，py中全部的操做最終都由底層的c語言來實現並完成，因此想要了解底層內存管理須要結合python源碼來進行解釋。html

1. 兩個重要的結構體

include/object.hpython

#define _PyObject_HEAD_EXTRA            \
    struct _object *_ob_next;           \
    struct _object *_ob_prev;
    
#define PyObject_HEAD       PyObject ob_base;

#define PyObject_VAR_HEAD      PyVarObject ob_base;


typedef struct _object {
    _PyObject_HEAD_EXTRA // 用於構造雙向鏈表
    Py_ssize_t ob_refcnt;  // 引用計數器
    struct _typeobject *ob_type;	// 數據類型
} PyObject;


typedef struct {
    PyObject ob_base;	// PyObject對象
    Py_ssize_t ob_size; /* Number of items in variable part，即：元素個數 */
} PyVarObject;

以上源碼是Python內存管理中的基石，其中包含了：git

2個結構體
- PyObject，此結構體中包含3個元素。
  - _PyObject_HEAD_EXTRA，用於構造雙向鏈表。
  - ob_refcnt，引用計數器。
  - *ob_type，數據類型。
- PyVarObject，次結構體中包含4個元素（ob_base中包含3個元素）
  - ob_base，PyObject結構體對象，即：包含PyObject結構體中的三個元素。
  - ob_size，內部元素個數。
3個宏定義
- PyObject_HEAD，代指PyObject結構體。
- PyVarObject_HEAD，代指PyVarObject對象。
- _PyObject_HEAD_EXTRA，代指先後指針，用於構造雙向隊列。

Python中全部類型建立對象時，底層都是與PyObject和PyVarObject結構體實現，通常狀況下由單個元素組成對象內部會使用PyObject結構體（float）、由多個元素組成的對象內部會使用PyVarObject結構體（str/int/list/dict/tuple/set/自定義類），由於由多個元素組成的話是須要爲其維護一個 ob_size（內部元素個數）。express

typedef struct {
    PyObject_HEAD
    double ob_fval;
} PyFloatObject;

include/floatobject.h

// longintrepr.h

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

// longobject.h

/* Long (arbitrary precision) integer object interface */
typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */

/*
1. python3中沒有long類型，只有int類型，但py3內部的int是基於long實現。
2. python3中對int/long長度沒有限制，因其內部不是用long存儲而是使用相似於「字符串」存儲。
*/

include/longobject.h

typedef struct {
    PyObject_VAR_HEAD
    Py_hash_t ob_shash;
    char ob_sval[1];
    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     */
} PyBytesObject;

include/bytesobject.h

typedef struct {
    PyObject_VAR_HEAD
      
    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;
  
    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     * list.sort() temporarily sets allocated to -1 to detect mutations.
     *
     * Items must normally not be NULL, except during construction when
     * the list is not yet visible outside the function that builds it.
     */
    Py_ssize_t allocated;
} PyListObject;

include/listobject.h

typedef struct {
    PyObject_VAR_HEAD
    PyObject *ob_item[1];

    /* ob_item contains space for 'ob_size' elements.
     * Items must normally not be NULL, except during construction when
     * the tuple is not yet visible outside the function that builds it.
     */
} PyTupleObject;

include/tupleobject.h

typedef struct {
    PyObject_HEAD
    Py_ssize_t ma_used;
    PyDictKeysObject *ma_keys;
    PyObject **ma_values;
} PyDictObject;

include/dictobject.h

typedef struct {
    PyObject_HEAD

    Py_ssize_t fill;            /* Number active and dummy entries*/
    Py_ssize_t used;            /* Number active entries */

    /* The table contains mask + 1 slots, and that's a power of 2.
     * We store the mask instead of the size because the mask is more
     * frequently needed.
     */
    Py_ssize_t mask;

    /* The table points to a fixed-size smalltable for small tables
     * or to additional malloc'ed memory for bigger tables.
     * The table pointer is never NULL which saves us from repeated
     * runtime null-tests.
     */
    setentry *table;
    Py_hash_t hash;             /* Only used by frozenset objects */
    Py_ssize_t finger;          /* Search finger for pop() */

    setentry smalltable[PySet_MINSIZE];
    PyObject *weakreflist;      /* List of weak references */
} PySetObject;

include/setobject.h

typedef struct _typeobject {
    PyObject_VAR_HEAD
    const char *tp_name; /* For printing, in format "<module>.<name>" */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */
    ...
    
} PyTypeObject;

自定義類 include/object.h

注意：Python3只保留int類型，但此時的int就是Python2中的long類型，請看以下官方提示： PEP 0237: Essentially, long renamed to int. That is, there is only one built-in integral type, named int; but it behaves mostly like the old long type.點擊查看原文。緩存

2. 內存管理

以float和list類型爲例，分析python源碼執行流程，瞭解內存管理機制。app

2.1 float類型

情景一：建立float對象時ide

val = 3.14

當按照上述方式建立一個Float類型對象時，源碼內部會前後執行以下代碼。函數

/* Special free list
   free_list is a singly-linked list of available PyFloatObjects, linked
   via abuse of their ob_type members.
*/
static PyFloatObject *free_list = NULL;
static int numfree = 0;
 
PyObject *
PyFloat_FromDouble(double fval)
{
    PyFloatObject *op = free_list;
    if (op != NULL) {
        free_list = (PyFloatObject *) Py_TYPE(op);
        numfree--;
    } else {
         
        // 第一步：根據float類型大小，爲float對象開闢內存。
        op = (PyFloatObject*) PyObject_MALLOC(sizeof(PyFloatObject));
        if (!op)
            return PyErr_NoMemory();
    }
 
    // 第二步：在爲float對象開闢的內存中進行初始化。
    /* Inline PyObject_New */
    (void)PyObject_INIT(op, &PyFloat_Type);
     
    // 第三步：將值賦值到float對象開闢的內存中。
    op->ob_fval = fval;
 
    // 第四步：返回已經建立的float對象的內存地址（引用/指針）
    return (PyObject *) op;
}

第一步：根據float類型所需的內存大小，爲其開闢內存。post

static PyMemAllocatorEx _PyObject = {
#ifdef PYMALLOC_DEBUG
    &_PyMem_Debug.obj, PYDBG_FUNCS
#else
    NULL, PYOBJ_FUNCS
#endif
    };



void *
PyObject_Malloc(size_t size)
{
    /* see PyMem_RawMalloc() */
    if (size > (size_t)PY_SSIZE_T_MAX)
        return NULL;

    // 開闢內存
    return _PyObject.malloc(_PyObject.ctx, size);
}

Objects/obmalloc.c

Customize Memory Allocators
===========================

.. versionadded:: 3.4

.. c:type:: PyMemAllocatorEx

   Structure used to describe a memory block allocator. The structure has
   four fields:

   +----------------------------------------------------------+---------------------------------------+
   | Field                                                    | Meaning                               |
   +==========================================================+=======================================+
   | ``void *ctx``                                            | user context passed as first argument |
   +----------------------------------------------------------+---------------------------------------+
   | ``void* malloc(void *ctx, size_t size)``                 | allocate a memory block               |
   +----------------------------------------------------------+---------------------------------------+
   | ``void* calloc(void *ctx, size_t nelem, size_t elsize)`` | allocate a memory block initialized   |
   |                                                          | with zeros                            |
   +----------------------------------------------------------+---------------------------------------+
   | ``void* realloc(void *ctx, void *ptr, size_t new_size)`` | allocate or resize a memory block     |
   +----------------------------------------------------------+---------------------------------------+
   | ``void free(void *ctx, void *ptr)``                      | free a memory block                   |
   +----------------------------------------------------------+---------------------------------------+

   .. versionchanged:: 3.5
      The :c:type:`PyMemAllocator` structure was renamed to
      :c:type:`PyMemAllocatorEx` and a new ``calloc`` field was added.

PyMemAllocatorEx的方法說明

第二步：對新開闢的內存中進行類型和引用的初始化ui

/* Macros trading binary compatibility for speed. See also pymem.h.
   Note that these macros expect non-NULL object pointers.*/
#define PyObject_INIT(op, typeobj) \
    ( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )

include/objimpl.h

/* Head of circular doubly-linked list of all objects.  These are linked
 * together via the _ob_prev and _ob_next members of a PyObject, which
 * exist only in a Py_TRACE_REFS build.
 */
static PyObject refchain = {&refchain, &refchain};

/* Insert op at the front of the list of all objects.  If force is true,
 * op is added even if _ob_prev and _ob_next are non-NULL already.  If
 * force is false amd _ob_prev or _ob_next are non-NULL, do nothing.
 * force should be true if and only if op points to freshly allocated,
 * uninitialized memory, or you've unlinked op from the list and are
 * relinking it into the front.
 * Note that objects are normally added to the list via _Py_NewReference,
 * which is called by PyObject_Init.  Not all objects are initialized that
 * way, though; exceptions include statically allocated type objects, and
 * statically allocated singletons (like Py_True and Py_None).
 */
void
_Py_AddToAllObjects(PyObject *op, int force)
{

    if (force || op->_ob_prev == NULL) {
        op->_ob_next = refchain._ob_next;
        op->_ob_prev = &refchain;
        refchain._ob_next->_ob_prev = op;
        refchain._ob_next = op;
    }
}

void
_Py_NewReference(PyObject *op)
{
    _Py_INC_REFTOTAL;

    // 對新開闢的內存中的的引用計數器初始化爲1。
    op->ob_refcnt = 1;

    // 將新開闢的內存的指針添加到一個雙向鏈表refchain中。
    _Py_AddToAllObjects(op, 1);

    _Py_INC_TPALLOCS(op);
}

Objects/object.c

因此，float類型每次建立對象時都會把對象放到 refchain 的雙向鏈表中。

情景二：float對象引用時

val = 7.8
data = val

這個過程比較簡單，在給對象建立新引用時，會對其引用計數器+1的動做。

/*
The macros Py_INCREF(op) and Py_DECREF(op) are used to increment or decrement
reference counts.  Py_DECREF calls the object's deallocator function when
the refcount falls to 0; for
objects that don't contain references to other objects or heap memory
this can be the standard function free().  Both macros can be used
wherever a void expression is allowed.  The argument must not be a
NULL pointer.  If it may be NULL, use Py_XINCREF/Py_XDECREF instead.
The macro _Py_NewReference(op) initialize reference counts to 1, and
in special builds (Py_REF_DEBUG, Py_TRACE_REFS) performs additional
bookkeeping appropriate to the special build.


#define Py_INCREF(op) (                         \
    _Py_INC_REFTOTAL  _Py_REF_DEBUG_COMMA       \
    ((PyObject *)(op))->ob_refcnt++)

include/object.h

情景三：銷燬float對象時

val = 3.14
# 主動刪除對象
del val 

"""
主動del刪除對象時，會執行對象銷燬的動做。
一個函數執行完畢以後，其內部局部變量也會有銷燬動做，如：
def func():
    val = 2.22

func()
"""

當進行銷燬對象動做時，前後會執行以下代碼：

The macros Py_INCREF(op) and Py_DECREF(op) are used to increment or decrement
reference counts.  Py_DECREF calls the object's deallocator function when
the refcount falls to 0; for
objects that don't contain references to other objects or heap memory
this can be the standard function free().  Both macros can be used
wherever a void expression is allowed.  The argument must not be a
NULL pointer.  If it may be NULL, use Py_XINCREF/Py_XDECREF instead.
The macro _Py_NewReference(op) initialize reference counts to 1, and
in special builds (Py_REF_DEBUG, Py_TRACE_REFS) performs additional
bookkeeping appropriate to the special build.


#define Py_DECREF(op)                                   \
    do {                                                \
        PyObject *_py_decref_tmp = (PyObject *)(op);    \
        if (_Py_DEC_REFTOTAL  _Py_REF_DEBUG_COMMA       \
        --(_py_decref_tmp)->ob_refcnt != 0)             \
            _Py_CHECK_REFCNT(_py_decref_tmp)            \
        else                                            \
        _Py_Dealloc(_py_decref_tmp);                    \
    } while (0)

include/object.h

void
_Py_Dealloc(PyObject *op)
{
    // 第一步：調用float類型的tp_dealloc，進行內存的銷燬
    destructor dealloc = Py_TYPE(op)->tp_dealloc;

    // 第二步：在refchain雙向鏈表中移除
    _Py_ForgetReference(op);

    (*dealloc)(op);
}

Objects/object.c

第一步，調用float類型的tp_dealloc進行內存的銷燬。

按理此過程說應該直接將對象內存銷燬，但float內部有緩存機制，因此他的執行流程是這樣的：

float內部緩存的內存個數已經大於等於100，那麼在執行`del val`的語句時，內存中就會直接刪除此對象。
未達到100時，那麼執行 `del val`語句，不會真的在內存中銷燬對象，而是將對象放到一個free_list的單鏈表中，以便之後的對象使用。

/* Special free list
   free_list is a singly-linked list of available PyFloatObjects, linked
   via abuse of their ob_type members.
*/

#ifndef PyFloat_MAXFREELIST
#define PyFloat_MAXFREELIST    100
#endif
static int numfree = 0;
static PyFloatObject *free_list = NULL;



PyTypeObject PyFloat_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "float",
    sizeof(PyFloatObject),
    0,

    // tp_dealloc表示執行float_dealloc方法
    (destructor)float_dealloc,                  /* tp_dealloc */
    0,                                          /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_reserved */
    ...
};



static void
float_dealloc(PyFloatObject *op)
{
    // 檢測是不是float類型
    if (PyFloat_CheckExact(op)) {

        // 檢測緩衝池個數是否大於100個
        if (numfree >= PyFloat_MAXFREELIST)  {
            // 若是大於100個，則在內存中銷燬對象
            PyObject_FREE(op);
            return;
        }
        // 不然，緩衝池個數+1
        // 並將要銷燬的數據加入到free_list的單項鍊表中，以便之後建立float類型使用。
        numfree++;
        Py_TYPE(op) = (struct _typeobject *)free_list;
        free_list = op;
    }
    else
        Py_TYPE(op)->tp_free((PyObject *)op);
}

Objects/floatobject.c

"""
瞭解Python中float類型的緩存機制以後，就能夠理解以下代碼的兩個內存地址竟然同樣的現象的本質了。
"""

v1 = 3.8
print(id(v1)) # 內存地址：140454027861640
del v1

v2 = 88.7
print(id(v2)) # 內存地址：140454027861640

擴展：讀源碼瞭解現象本質

void
PyObject_Free(void *ptr)
{
    // 與上述開闢內存相似
    _PyObject.free(_PyObject.ctx, ptr);
}

Objects/obmalloc.c

第二步，在refchain雙向鏈表中移除

/* Head of circular doubly-linked list of all objects.  These are linked
 * together via the _ob_prev and _ob_next members of a PyObject, which
 * exist only in a Py_TRACE_REFS build.
 */
static PyObject refchain = {&refchain, &refchain};


void
_Py_ForgetReference(PyObject *op)
{
#ifdef SLOW_UNREF_CHECK
    PyObject *p;
#endif
    if (op->ob_refcnt < 0)
        Py_FatalError("UNREF negative refcnt");
    if (op == &refchain ||
        op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op) {
        fprintf(stderr, "* ob\n");
        _PyObject_Dump(op);
        fprintf(stderr, "* op->_ob_prev->_ob_next\n");
        _PyObject_Dump(op->_ob_prev->_ob_next);
        fprintf(stderr, "* op->_ob_next->_ob_prev\n");
        _PyObject_Dump(op->_ob_next->_ob_prev);
        Py_FatalError("UNREF invalid object");
    }
#ifdef SLOW_UNREF_CHECK
    for (p = refchain._ob_next; p != &refchain; p = p->_ob_next) {
        if (p == op)
            break;
    }
    if (p == &refchain) /* Not found */
        Py_FatalError("UNREF unknown object");
#endif
    op->_ob_next->_ob_prev = op->_ob_prev;
    op->_ob_prev->_ob_next = op->_ob_next;
    op->_ob_next = op->_ob_prev = NULL;
    _Py_INC_TPFREES(op);
}

Objects/object.c

綜上所述，float對象在建立對象時會把爲其開闢內存並初始化引用計數器爲1，而後將其加入到名爲 refchain 的雙向鏈表中；float對象在增長引用時，會執行 Py_INCREF在內部會讓引用計數器+1；最後執行銷燬float對象時，會先判斷float內部free_list中緩存的個數，若是已達到300個，則直接在內存中銷燬，不然不會真正銷燬而是加入free_list單鏈表中，之後後續對象使用，銷燬動做的最後再在refchain中移除便可。

2.2 list類型

垃圾回收機制

Python的垃圾回收機制是以：引用計數器爲主，標記清除和分代回收爲輔。

1. 引用計數器

每一個對象內部都維護了一個值，該值記錄這此對象被引用的次數，若是次數爲0，則Python垃圾回收機制會自動清除此對象。下圖是Python源碼中引用計數器存儲的代碼。

引用計數器的獲取及代碼示例：

import sys

# 在內存中建立一個字符串對象"武沛齊"，對象引用計數器的值爲：1
nick_name = '武沛齊'

# 應該輸入2，實際輸出2，由於getrefcount方法時把 nick_name 當作參數傳遞了，引起引用計數器+1，因此打印時值爲：2
# 注意：getrefcount 函數執行完畢後，會自動-1，因此本質上引用計數器仍是1.
print(sys.getrefcount(nick_name))

# 變量 real_name 也指向的字符串對象"武沛齊"，即：引用計數器再 +1，因此值爲：2
real_name = nick_name

# 應該輸出2，實際輸出3. 由於getrefcount方法時把 real_name 當作參數傳遞了，引起引用計數器+1，因此打印時值爲：3
# 注意：getrefcount 函數執行完畢後，會自動-1，因此本質上引用計數器仍是2.
print(sys.getrefcount(nick_name))

# 刪除reald_name變量，並讓其指向對象中的引用計數器-1
del real_name

# 應該輸出1，實際輸出2，由於getrefcount方法時把 real_name 當作參數傳遞了，引起引用計數器+1，因此打印時值爲：2.
print(sys.getrefcount(nick_name))





# ############ getrefcount 註釋信息 ############
'''
def getrefcount(p_object): # real signature unknown; restored from __doc__
    """
    getrefcount(object) -> integer
    
    Return the reference count of object.  The count returned is generally
    one higher than you might expect, because it includes the (temporary)
    reference as an argument to getrefcount().
    """
    return 0
'''

2. 循環引用

經過引用計數器的方式基本上能夠完成Python的垃圾回收，但它仍是具備明顯的缺陷，即：「循環引用」 。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import gc
import objgraph


class Foo(object):
    def __init__(self):
        self.data = None


# 在內存建立兩個對象，即：引用計數器值都是1
obj1 = Foo()
obj2 = Foo()

# 兩個對象循環引用，致使內存中對象的應用+1，即：引用計數器值都是2
obj1.data = obj2
obj2.data = obj1

# 刪除變量，並將引用計數器-1。
del obj1
del obj2

# 關閉垃圾回收機制，由於python的垃圾回收機制是：引用計數器、標記清除、分代回收 配合已解決循環引用的問題，關閉他便於以後查詢內存中未被釋放對象。
gc.disable()

# 至此，因爲循環引用致使內存中建立的obj1和obj2兩個對象引用計數器不爲0，沒法被垃圾回收機制回收。
# 因此，內存中Foo類的對象就還顯示有2個。
print(objgraph.count('Foo'))

注意：gc.collect() 能夠主動觸發垃圾回收；

循環引用的問題會引起內存中的對象一直沒法釋放，從而內存逐漸增大，最終致使內存泄露。

爲了解決循環引用的問題，Python又在引用計數器的基礎上引入了標記清除和分代回收的機制。

so，沒必要再擔憂循環引用的問題了。

Reference cycles involving lists, tuples, instances, classes, dictionaries, and functions are found.

Python GC 源碼文檔：http://www.arctrix.com/nas/python/gc/

3. 標記清除&分代回收

Python爲了解決循環引用，針對 lists, tuples, instances, classes, dictionaries, and functions 類型，每建立一個對象都會將對象放到一個雙向鏈表中，每一個對象中都有 _ob_next 和 _ob_prev 指針，用於掛靠到鏈表中。

/* Nothing is actually declared to be a PyObject, but every pointer to
 * a Python object can be cast to a PyObject*.  This is inheritance built
 * by hand.  Similarly every pointer to a variable-size Python object can,
 * in addition, be cast to PyVarObject*.
 */
typedef struct _object {
    _PyObject_HEAD_EXTRA # 雙向鏈表
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;


/* Define pointers to support a doubly-linked list of all live heap objects. */
#define _PyObject_HEAD_EXTRA            \
    struct _object *_ob_next;           \
    struct _object *_ob_prev;

隨着對象的建立，該雙向鏈表上的對象會愈來愈多。

當對象個數超過 700個時，Python解釋器就會進行垃圾回收。
當代碼中主動執行 gc.collect() 命令時，Python解釋器就會進行垃圾回收。
```
import gc

gc.collect()
```

Python解釋器在垃圾回收時，會遍歷鏈表中的每一個對象，若是存在循環引用，就將存在循環引用的對象的引用計數器 -1，同時Python解釋器也會將計數器等於0（可回收）和不等於0（不可回收）的一分爲二，把計數器等於0的全部對象進行回收，把計數器不爲0的對象放到另一個雙向鏈表表（即：分代回收的下一代）。

關於分代回收（generations）：

The GC classifies objects into three generations depending on how many collection sweeps they have survived. New objects are placed in the youngest generation (generation 0). If an object survives a collection it is moved into the next older generation. Since generation 2 is the oldest generation, objects in that generation remain there after a collection. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. Initially only generation 0 is examined. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then generation 1 is examined as well. Similarly, threshold2 controls the number of collections of generation 1 before collecting generation 2.

# 默認狀況下三個閾值爲 (700,10,10) ，也能夠主動去修改默認閾值。
import gc

gc.set_threshold(threshold0[, threshold1[, threshold2]])

官方文檔： https://docs.python.org/3/library/gc.html

參考文檔：

　　http://www.wklken.me/posts/2015/09/29/python-source-gc.html

　　https://yq.aliyun.com/users/yqzdoezsuvujg/album?spm=a2c4e.11155435.0.0.d07467451AwRxO

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。