細說 PEP 468: Preserving Keyword Argument Order

時間 2020-05-06

標籤細說 pep preserving keyword argument order 欄目 Microsoft Office 简体版

原文原文鏈接

細說 PEP 468: Preserving Keyword Argument Order

Python 3.6.0 版本對字典作了優化，新的字典速度更快，佔用內存更少，很是神奇。從網上找了資料來看，大部分都指向了 [Python-Dev] More compact dictionaries with faster iteration 這篇文章，裏面歸納性地介紹了舊字典和新字典的差異，以及優化的地方，頗有創意。html

然而我很是好奇這樣的結構是怎麼用C語言實現的，因此去看了源碼。我分別找到 3.5.9 和 3.6.0 版本的 Python 字典源代碼，對比了一下，發現 Python 裏字典的實現處處都是神操做，使人振奮。因而，一個想法產生了，不如就從源碼角度，細說一下 PEP 468 對字典的改進，也算是對 [Python-Dev] More compact dictionaries with faster iteration 的補充。python

若是上來就對比 3.5.9 和 3.6.0 的代碼差別，是沒有辦法把事情說清楚的，因此我還得多囉嗦一些，把字典數據結構先完整地分析一下，而後就能夠愉快地對比差別了 : )數組

如無特殊說明，默認參考Python 3.6.0版本。緩存

新特性

在 Python 的新特性變動記錄頁面，能夠看到 Python 從 3.6 版本開始，支持有序字典，並且內存佔用更少。性能優化

Python 3.6.0 beta 1

Release date: 2016-09-12數據結構

Core and Builtins

...

bpo-27350: dict implementation is changed like PyPy. It is more compact and preserves insertion order. (Concept developed by Raymond Hettinger and patch by Inada Naoki.)

...

dict 數據結構簡述

我原本想拿 Python 3.5.9 的結構來對比一下，不事後來想了想，沒有必要。兩者差別不大，不須要把對比搞得很麻煩，我直接介紹 Python 3.6.0 的字典結構，而後直接對比源碼細節，就已經夠清楚了。再參考 [Python-Dev] More compact dictionaries with faster iteration ，就更清晰了。less

結構

涉及到字典對象的結構主要有3個：ide

PyDictObject (./Include/dictobject.h)
PyDictKeysObject (./Objects/dict-common.h) 就是頭文件裏面的 struct _dictkeysobject
PyDictKeyEntry (./Objects/dict-common.h)

下面依次說明一下各個數據的定義：函數

PyDictObjectoop

字典對象，Python裏面全部字典，無論是咱們本身用dict()建立的仍是類的__dict__屬性，都是它。
- PyObject_HEAD
  
  Python裏全部東西都是對象，並且這些對象都是無類型的，那麼想一想這兩個問題：在C語言裏，類型是固定的，並且沒有運行時類型檢查，那麼怎麼樣實現動態調用呢？動態調用之後怎麼識別類型呢？
  
  沒錯，就是"看臉"，假如每一個對象都有一個這樣的PyObject_HEAD，其中包含類型信息，那麼就能夠用指針動態調用，而後根據其中的類型信息動態識別類型了。打個比方，假如你的對象不少不少，TA們的身高體重長相各自都是固定的，你今天約這個，明天約那個，「類型」變了怎麼辦？不礙事呀，用手機「動態調用」，聽聲音或者見面「識別類型」，一個道理嘛，哈哈哈哈哈哈哈……
  
  再多說一句，必定要作好「類型檢查」，~~若是讓你的對象發現你內心想的是別人，那就翻車了！這時候程序就出錯崩潰了！~~
- ma_used
  
  當前字典裏的item數量。
- ma_version_tag
  
  有一個64位無符號全局變量pydict_global_version，字典的建立和每次更改，都會給這個全局變量+1，而後賦值(不是引用)給ma_version_tag。因此在不一樣的時刻，只須要看ma_version_tag變沒變，就知道字典變沒變，不須要檢查字典的內容。這個特性能夠優化程序。參考 PEP 509 -- Add a private version to dict
- PyDictKeysObject *ma_keys
  
  字典的鍵對象指針，雖然這個對象叫KeysObject，可是裏面也有value，在"combined"模式下value(即 me_value)有效；而在"splitted"模式下me_value無效，改用PyObject **ma_values。
  - dk_refcnt
    
    在"splitted"模式下，一個PyDictKeysObject被不少PyDictObject共用，這個引用計數就起做用了。
  - dk_size
    
    字典哈希表空間大小，指實際申請的內存空間，相似C++裏vector的capacity屬性的含義。
    
    這個值也是數組dk_indices的大小，必須是 2 的整數次冪，不夠用時再動態擴容。
    
    ~~儘管好奇心害死貓，~~不過爲何必須是 2 的整數次冪？我摘出來幾行代碼。
```
#define DK_MASK(dk) (((dk)->dk_size)-1)
size_t mask = DK_MASK(k);
i = (size_t)hash & mask;	// 經過哈希值計算哈希表索引
```
    這就明白了，哈希值的數據類型是size_t，可能致使哈希表訪問越界，因此要對哈希表長度取餘數，爲了用與操做加速取餘數運算，把dk_size規定爲 2 的整數次冪。
  - dk_lookup
    
    查找函數，從哈希表中找出指定元素。共有4個函數。
    
    /* Function to lookup in the hash table (dk_indices):
    - lookdict(): general-purpose, and may return DKIX_ERROR if (and
    only if) a comparison raises an exception.
    
    - lookdict_unicode(): specialized to Unicode string keys, comparison of
    which can never raise an exception; that function can never return
    DKIX_ERROR.
    
    - lookdict_unicode_nodummy(): similar to lookdict_unicode() but further
    specialized for Unicode string keys that cannot be the value.
    
    - lookdict_split(): Version of lookdict() for split tables. */
    
    Python 裏大量用到以字符串做爲key的字典，因此對它作了專門的優化，儘可能多用字符串做爲key吧！
  - dk_usable
    
    字典裏的可用entry(hash-key-value)數量，爲了下降哈希碰撞，只佔dk_size的2/3，由USABLE_FRACTION宏設置。
    
    這個值在初始化時也是數組dk_entries或ma_values的大小，不夠用時再動態擴容。
  - dk_nentries
    
    數組dk_entries或ma_values的已用entry數量。
  - dk_indices
    
    哈希索引表數組，它是一個哈希表，可是存儲的內容是dk_entries裏元素的索引。
    
    參考 PEP 468 -- Preserving the order of **kwargs in a function.
  - PyDictKeyEntry dk_entries[dk_usable]
    
    Python 裏管一個hash-key-value的組合叫一個entry，這個概念會常常出現。注意它和ma_values的區別，dk_entries是一個數組，存儲區域緊跟在dk_indices後面，而ma_values是一個指針，指向的存儲區域並不在PyDictObject末尾。在分析dictresize()函數的時候，會看到這個特性帶來的影響。
    - me_hash
    - me_key
    - me_value
  - ...(下一個PyDictKeyEntry)
- PyObject **ma_values
  
  Python 3.3 引入了新的字典實現方式: splitted dict，這是一個針對類屬性實現的結構，想象這樣的應用場景：一個類，定義好之後屬性名字不變(假設不動態更改)，它有不少不一樣的實例，這些實例屬性值不一樣，可是屬性名字相同，若是這些__dict__共用一套key，能夠節約內存。參考 PEP 412 -- Key-Sharing Dictionary
  
  在這個模式下，多個不一樣的PyDictObject對象裏面的ma_keys指針指向同一個PyDictKeysObject對象。原來的字典裏的entry(hash-key-value)是一個總體，牽一髮而動全身，如今key合併了，也就意味着entry合併了，因此value也被迫合併了，可是咱們不能讓value合併，由於這種模式下不一樣的PyDictKeysObject對象的key同樣，可是value不同，沒有辦法，就只好在entry結構外面添加value數組，代替被迫合併的entry->value，這個外加的value數組就分別附加到多個不一樣的PyDictObject對象後面。這個作法分開了key和value，因此取名"splitted"
  
  /* If ma_values is NULL, the table is "combined": keys and values
  are stored in ma_keys.
  
  If ma_values is not NULL, the table is splitted:
  keys are stored in ma_keys and values are stored in ma_values */
  
  PyObject **ma_values;
  
  此外，「splitted」模式還有2個條件，與類屬性吻合：
  
  Only string (unicode) keys are allowed.
  All dicts sharing same key must have same insertion order.

源碼

把 Python 3.5.9 版本和 3.6.0 版本的結構體拿出來對比一下，Python 3.6.0 加了不少在 Python 3.5.9 裏面沒有的註釋，很是優秀的行爲！！不過這裏只保留了不一樣部分的註釋。

/* ./Objects/dict-common.h */
/* Python 3.5.9 */
struct _dictkeysobject {
    Py_ssize_t dk_refcnt;
    Py_ssize_t dk_size;
    dict_lookup_func dk_lookup;
    Py_ssize_t dk_usable;
    PyDictKeyEntry dk_entries[1];
};

/* Python 3.6.0 */
struct _dictkeysobject {
    Py_ssize_t dk_refcnt;
    Py_ssize_t dk_size;
    dict_lookup_func dk_lookup;
    Py_ssize_t dk_usable;
    
    /* Number of used entries in dk_entries. */
    Py_ssize_t dk_nentries;

    /* Actual hash table of dk_size entries. It holds indices in dk_entries,
       or DKIX_EMPTY(-1) or DKIX_DUMMY(-2).

       Indices must be: 0 <= indice < USABLE_FRACTION(dk_size).

       The size in bytes of an indice depends on dk_size:

       - 1 byte if dk_size <= 0xff (char*)
       - 2 bytes if dk_size <= 0xffff (int16_t*)
       - 4 bytes if dk_size <= 0xffffffff (int32_t*)
       - 8 bytes otherwise (int64_t*)

       Dynamically sized, 8 is minimum. */
    union {
        int8_t as_1[8];
        int16_t as_2[4];
        int32_t as_4[2];
#if SIZEOF_VOID_P > 4
        int64_t as_8[1];
#endif
    } dk_indices;

    /* "PyDictKeyEntry dk_entries[dk_usable];" array follows:
       see the DK_ENTRIES() macro */
};

經過註釋能夠知道這些新添加的變量的用途，不過在結構體裏面，dk_entries的定義註釋掉了，這是怎麼回事呢？根據註釋的指引，找到DK_ENTRIES一探究竟。

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
#define DK_SIZE(dk) ((dk)->dk_size)
#if SIZEOF_VOID_P > 4
#define DK_IXSIZE(dk)                          \
    (DK_SIZE(dk) <= 0xff ?                     \
        1 : DK_SIZE(dk) <= 0xffff ?            \
            2 : DK_SIZE(dk) <= 0xffffffff ?    \
                4 : sizeof(int64_t))
#else
#define DK_IXSIZE(dk)                          \
    (DK_SIZE(dk) <= 0xff ?                     \
        1 : DK_SIZE(dk) <= 0xffff ?            \
            2 : sizeof(int32_t))
#endif
#define DK_ENTRIES(dk) \
    ((PyDictKeyEntry*)(&(dk)->dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)]))

DK_SIZE取得dk_size，也就是數組dk_indices的元素數量
DK_IXSIZE根據dk_size設置當前dk_indices每一個元素佔用的字節數
DK_ENTRIES根據dk對象(PyDictKeysObject對象)取得dk_entries數組首地址

因而，dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)]就會定位到dk_indices後面的第一個地址，也就是dk_indices恰好越界的地方。什麼？越界？對，由於後面緊跟着的就是dk_entries對應的空間，DK_ENTRIES宏取得的這個地址就是dk_entries數組的首地址。多麼有趣的玩法 : )

爲何要搞這麼麻煩呢？像 Python 3.5.9 裏面那樣直接定義dk_entries很差嗎？我想這大概是由於dk_indices也是動態的。若是直接定義dk_entries，那它的首地址相對結構體而言就是固定的，當dk_indices數組長度動態變化的時候，使用&dk->dk_entries[0]這樣的語句就會獲得錯誤的地址。具體的內存分佈還須要看new_keys_object()函數。

爲何小

接着上面的內容，分析new_keys_object()函數，從這裏能夠看到PyDictKeysObject對象的內存分佈。我在關鍵位置加了註釋，省略一些不影響理解流程的代碼。

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
...
/* Get the size of a structure member in bytes */
#define Py_MEMBER_SIZE(type, member) sizeof(((type *)0)->member)
...
static PyDictKeysObject *new_keys_object(Py_ssize_t size)
{
    PyDictKeysObject *dk;
    Py_ssize_t es, usable;
    
    assert(size >= PyDict_MINSIZE);
    assert(IS_POWER_OF_2(size));
    
    // dk_indices 有 2/3 能用(usable), 1/3 不使用
    // PyDictKeyEntry dk_entries[dk_usable] 只申請 usable 部份內存
    usable = USABLE_FRACTION(size);     // 2/3
    if (size <= 0xff) {
        es = 1;     // 字節數
    }
    else if (size <= 0xffff) {
        ...
    }

    // 爲 PyDictKeysObject *dk 申請內存
    // 使用緩存池
    if (size == PyDict_MINSIZE && numfreekeys > 0) {
        dk = keys_free_list[--numfreekeys];
    }
    else {
        dk = PyObject_MALLOC(// Py_MEMBER_SIZE 獲得 dk_indices 以前的大小
                             sizeof(PyDictKeysObject)
                             - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                             // 字節數 * dk_indices元素數量
                             + es * size
                             // PyDictKeyEntry dk_entries[dk_usable]
                             // 這部份內容沒有在 struct _dictkeysobject 結構體裏定義，可是實際申請了空間
                             // 由於 dk_indices 長度也是可變的，因此使用 DK_ENTRIES 宏來操做 dk_entries
                             // 爲了節約空間，只申請 usable 部分，因此 dk_indices 比 dk_entries 長
                             + sizeof(PyDictKeyEntry) * usable);
        ...
    }
    DK_DEBUG_INCREF dk->dk_refcnt = 1;
    dk->dk_size = size;
    dk->dk_usable = usable;
    dk->dk_lookup = lookdict_unicode_nodummy;
    dk->dk_nentries = 0;
    // dk_indices 初始化爲0xFF 對應 #define DKIX_EMPTY (-1)
    memset(&dk->dk_indices.as_1[0], 0xff, es * size);
    // dk_entries 初始化爲 0
    // DK_ENTRIES 宏用於定位 dk_entries，至關於 &dk->dk_entries[0]
    memset(DK_ENTRIES(dk), 0, sizeof(PyDictKeyEntry) * usable);
    return dk;
}

PyObject_MALLOC申請到的內存，就是這個字典的PyDictKeysObject對象，這個結構體內存能夠分爲3部分：

dk_indices以前的部分：sizeof(PyDictKeysObject) - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)

用頭文件定義的結構體大小減去dk_indices的大小，就是dk_indices以前的部分，包含dk_refcnt, dk_size, dk_lookup, dk_usable, dk_nentries
dk_indices：es * size

字節數 * dk_indices元素數量。
dk_entries：sizeof(PyDictKeyEntry) * usable)

從這裏能夠看到，dk_entries的長度不是size，只申請 usable 部分。

再對比一下 Python 3.5.9 的dk = PyMem_MALLOC(...)內存申請和dk_entries尋址，就能夠明白兩者巨大的差別。

/* Python 3.5.9 */
/* ./Objects/dictobject.c */
static PyDictKeysObject *new_keys_object(Py_ssize_t size)
{
    PyDictKeysObject *dk;
    Py_ssize_t i;
    PyDictKeyEntry *ep0;
    ...
    dk = PyMem_MALLOC(sizeof(PyDictKeysObject) +
                      // 結構體裏面 PyDictKeyEntry dk_entries[1] 加上這裏 size-1，共size個
                      sizeof(PyDictKeyEntry) * (size-1));
    ...
    ep0 = &dk->dk_entries[0];
    /* Hash value of slot 0 is used by popitem, so it must be initialized */
    ep0->me_hash = 0;
    for (i = 0; i < size; i++) {
        ep0[i].me_key = NULL;
        ep0[i].me_value = NULL;
    }
    dk->dk_lookup = lookdict_unicode_nodummy;
    return dk;
}

對比一下這 2 個PyObject_MALLOC()函數申請的內存空間，就知道爲何新的字典佔用內存更少了。

分析完內存佈局， PEP 468 的改進就很是清晰了，如今能夠對照 PEP 468 提供的資料確認一下，若是有一種豁然開朗的感受，那就對了；~~若是沒有，多是茂密的頭髮阻礙了你變強，建議剃光~~。跟隨 PEP 468 說明連接找到 [Python-Dev] More compact dictionaries with faster iteration ，裏面描述的第一個entries數組對應 3.5.9 版本的dk_entries；後面的indices和entries對應 3.6.0 版本的dk_indices和dk_entries數組，跟上面的代碼對上了。

The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer.

Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics.

看完源碼，對這個說明的理解就更加深入了吧，嘿嘿 : )

不過，到這裏還沒完，new_keys_object()函數只是建立了PyDictKeysObject對象，最終目標應該是PyDictObject，建立PyDictObject對象的函數是PyDict_New()

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
PyObject *
PyDict_New(void)
{
    PyDictKeysObject *keys = new_keys_object(PyDict_MINSIZE);
    if (keys == NULL)
        return NULL;
    return new_dict(keys, NULL);	// combined 模式下 values 是 NULL
}

new_keys_object()看過了，接着看看new_dict()，仍然省略掉部分類型檢查和異常檢查代碼。

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
static PyObject *
new_dict(PyDictKeysObject *keys, PyObject **values)
{
    PyDictObject *mp;
    assert(keys != NULL);
    if (numfree) {
        // 緩存池
        mp = free_list[--numfree];
        ...
        _Py_NewReference((PyObject *)mp);
    }
    else {
        mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
        ...
    }
    mp->ma_keys = keys;			// 傳遞 new_keys_object 函數生成的 PyDictKeysObject 對象
    mp->ma_values = values;		// combined 模式下 values 是 NULL
    mp->ma_used = 0;			// 初始化的字典沒有元素
    mp->ma_version_tag = DICT_NEXT_VERSION();	// 版本號，參考上面數據結構裏的說明
    assert(_PyDict_CheckConsistency(mp));
    return (PyObject *)mp;
}

到這裏，一個 dict 對象就算正式建立完成了，咱們在 Python 裏也能夠開始愉快地玩耍了。不過注意，這裏建立出來的字典是「combined」模式的。「splitted」模式的字典在「combined」模式基礎上還初始化了ma_values，我這裏就懶得詳細介紹了。

爲何有序

經過前面分析的數據結構，咱們知道，字典元素保存在dk_entries數組中。當一個數據結構有序，指的是它裏面元素的順序與插入順序相同。元素插入哈希表的索引是哈希函數算出來的，應該是無序的，這就是以前的字典元素無序的緣由。而 Python 3.6.0 引入了dk_indices數組，專門記錄哈希表信息，那麼元素插入的順序信息就得以保留在dk_entries數組中。爲了知足好奇心，下面分析一下插入函數。

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
/*
Internal routine to insert a new item into the table.
Used both by the internal resize routine and by the public insert routine.
Returns -1 if an error occurred, or 0 on success.
*/
static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
    PyObject *old_value;
    PyObject **value_addr;
    PyDictKeyEntry *ep, *ep0;
    Py_ssize_t hashpos, ix;
	...
    ix = mp->ma_keys->dk_lookup(mp, key, hash, &value_addr, &hashpos);
    ...
    Py_INCREF(value);
    MAINTAIN_TRACKING(mp, key, value);
	...
	/* 插入新值 */
    if (ix == DKIX_EMPTY) {
        /* Insert into new slot. */
        /* dk_entries 數組填滿的時候給字典擴容 */
        if (mp->ma_keys->dk_usable <= 0) {
            /* Need to resize. */
            if (insertion_resize(mp) < 0) {
                Py_DECREF(value);
                return -1;
            }
            find_empty_slot(mp, key, hash, &value_addr, &hashpos);
        }
        ep0 = DK_ENTRIES(mp->ma_keys);
        ep = &ep0[mp->ma_keys->dk_nentries];	// 每次插入位置在最後
        dk_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
        Py_INCREF(key);
        ep->me_key = key;
        ep->me_hash = hash;
        if (mp->ma_values) {
            assert (mp->ma_values[mp->ma_keys->dk_nentries] == NULL);
            mp->ma_values[mp->ma_keys->dk_nentries] = value;
        }
        else {
            ep->me_value = value;
        }
        mp->ma_used++;
        mp->ma_version_tag = DICT_NEXT_VERSION();
        mp->ma_keys->dk_usable--;
        mp->ma_keys->dk_nentries++;
        assert(mp->ma_keys->dk_usable >= 0);
        assert(_PyDict_CheckConsistency(mp));
        return 0;
    }

    assert(value_addr != NULL);
	/* 替換舊值 */
    old_value = *value_addr;
    if (old_value != NULL) {
        *value_addr = value;
        mp->ma_version_tag = DICT_NEXT_VERSION();
        assert(_PyDict_CheckConsistency(mp));

        Py_DECREF(old_value); /* which **CAN** re-enter (see issue #22653) */
        return 0;
    }

    /* pending state */
    assert(_PyDict_HasSplitTable(mp));
    assert(ix == mp->ma_used);
    *value_addr = value;
    mp->ma_used++;
    mp->ma_version_tag = DICT_NEXT_VERSION();
    assert(_PyDict_CheckConsistency(mp));
    return 0;
}

在插入函數中，第一個重點關注對象應該是ix = mp->ma_keys->dk_lookup(mp, key, hash, &value_addr, &hashpos)這句代碼。dk_lookup是一個函數指針，指向四大搜索函數的其中一個，這裏有必要說明一下各參數和返回值：

參數
1. PyDictObject *mp (已知參數)
  
  字典對象，在該對象中查找。
2. PyObject *key (已知參數)
  
  entry裏的key，表明key對象的引用，用於第一次斷定，若是引用相同就找到了；若是不一樣再判斷hash
3. Py_hash_t hash (已知參數)
  
  entry裏的hash，用於第二次斷定，若是哈希值相同就找到了；若是不一樣就表明沒找到。
4. PyObject ***value_addr (未知參數，用指針返回數據)
  
  若是找到元素，則value_addr返回對應的me_value的指針；若是沒找到，*value_addr爲NULL
5. Py_ssize_t *hashpos (未知參數，用指針返回數據)
  
  hashpos返回元素在哈希表中的位置。
返回值
- Py_ssize_t ix
  
  返回元素在dk_entries數組中的索引。若是不是有效元素，ix多是DKIX_EMPTY, DKIX_DUMMY, DKIX_ERROR中的一個，分別表明dk_entries數組中的新空位，刪除舊值留下的空位，錯誤。

瞭解了各個參數的做用，就能夠繼續愉快地看代碼了。而後就看到了這一句ep = &ep0[mp->ma_keys->dk_nentries]，根據它下面的代碼能夠知道，這個ep就是新元素插入的地方，表明一個PyDictKeyEntry對象指針，而mp->ma_keys->dk_nentries指向的位置，就是dk_entries數組的末尾。也就是說，每次的新元素插入字典，都會依次放到dk_entries數組裏，保持了插入順序。那麼哈希函數計算出來的插入位置呢？答案就在dk_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries)函數裏。

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
/* write to indices. */
static inline void
dk_set_index(PyDictKeysObject *keys, Py_ssize_t i, Py_ssize_t ix)
{
    Py_ssize_t s = DK_SIZE(keys);

    assert(ix >= DKIX_DUMMY);

    if (s <= 0xff) {
        int8_t *indices = keys->dk_indices.as_1;
        assert(ix <= 0x7f);
        indices[i] = (char)ix;	// 填充 dk_indices 數組
    }
    else if (s <= 0xffff) {
        ...
    }
}

能夠看到，哈希函數計算出來的插入位置保存到了dk_indices數組裏，而對應插入位置保存的信息就是這個元素在dk_entries數組裏的索引。
若是沒看明白，就再回顧一下 [Python-Dev] More compact dictionaries with faster iteration 中的描述。

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

...

Instead, the data should be organized as follows:

indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

是時候了，如今拿出 Python 3.5.9 的代碼對比一下，只對比 Empty 狀態的 slot 插入代碼便可。

/* Python 3.5.9 */
/* ./Objects/dictobject.c */
/*
Internal routine to insert a new item into the table.
Used both by the internal resize routine and by the public insert routine.
Returns -1 if an error occurred, or 0 on success.
*/
static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
    PyObject *old_value;
    PyObject **value_addr;
    PyDictKeyEntry *ep;
    assert(key != dummy);

    Py_INCREF(key);
    Py_INCREF(value);
    ...
    ep = mp->ma_keys->dk_lookup(mp, key, hash, &value_addr);
    ...
    old_value = *value_addr;
    /* Active 狀態 */
    if (old_value != NULL) {
        ...
    }
    else {
        /* Empty 狀態 */
        if (ep->me_key == NULL) {
            if (mp->ma_keys->dk_usable <= 0) {
                /* Need to resize. */
                ...
            }
            mp->ma_used++;
            *value_addr = value;	// 直接向 dk_entries 數組插入元素
            mp->ma_keys->dk_usable--;
            assert(mp->ma_keys->dk_usable >= 0);
            ep->me_key = key;
            ep->me_hash = hash;
            assert(ep->me_key != NULL && ep->me_key != dummy);
        }
        /* Dummy 狀態 */
        else {
            ...
        }
    }
    return 0;
    ...
}

能夠看到*value_addr = value這句代碼填充了dk_entries，可是這裏信息是不夠的，value_addr來自搜索函數，因而我找到通用搜索函數lookdict，來看下它裏面獲取插入位置的關鍵代碼。

/* Python 3.5.9 */
/* ./Objects/dictobject.c */
static PyDictKeyEntry *
lookdict(PyDictObject *mp, PyObject *key,
         Py_hash_t hash, PyObject ***value_addr)
{
    ...
    mask = DK_MASK(mp->ma_keys);
    ep0 = &mp->ma_keys->dk_entries[0];
    i = (size_t)hash & mask;	// 靠哈希值找到插入位置
    ep = &ep0[i];	// 直接按照位置插入到 dk_entries 數組中
    if (ep->me_key == NULL || ep->me_key == key) {
        *value_addr = &ep->me_value;	// 用指針返回 me_value 做爲插入地址
        return ep;
    }
    ...
}

能夠清晰地看到，哈希函數計算出來的位置是直接對應到dk_entries數組中的，元素也直接放進去，沒有dk_indices數組。由於哈希值不是連續的，因此咱們依次插入到dk_entries數組裏的元素也就不連續了。
若是又沒看明白，就再回顧一下 [Python-Dev] More compact dictionaries with faster iteration 中的描述。

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]

爲何快

迭代變快的緣由源自dk_entries數組的密集化，迭代時遍歷的數量少。Python 3.5.9 和 3.6.0 版本代碼的寫法差別不大，因此這裏只摘取一段dictresize()的數據複製代碼對比。對dictresize()函數的具體分析放在附錄裏。

/* ./Objects/dictobject.c */
/* Python 3.5.9 */
static int
dictresize(PyDictObject *mp, Py_ssize_t minused)
{
    ...
    /* Main loop */
    for (i = 0; i < oldsize; i++) {
        PyDictKeyEntry *ep = &oldkeys->dk_entries[i];
        if (ep->me_value != NULL) {
            assert(ep->me_key != dummy);
            insertdict_clean(mp, ep->me_key, ep->me_hash, ep->me_value);
        }
    }
    mp->ma_keys->dk_usable -= mp->ma_used;
    ...
}
/* Python 3.6.0 */
static int
dictresize(PyDictObject *mp, Py_ssize_t minsize)
{
    ...
    /* Main loop */
    for (i = 0; i < oldkeys->dk_nentries; i++) {
        PyDictKeyEntry *ep = &ep0[i];
        if (ep->me_value != NULL) {
            insertdict_clean(mp, ep->me_key, ep->me_hash, ep->me_value);
        }
    }
    mp->ma_keys->dk_usable -= mp->ma_used;
    ...
}

如今知道是哪些代碼節省了時間嗎？就是全部for (i = 0; i < oldkeys->dk_nentries; i++){...}代碼塊。在 Python 3.5.9 中，它們對應for (i = 0; i < oldsize; i++){...}，其中的oldsize等於oldkeys->dk_size，只看代碼的寫法，沒有什麼區別，可是根據USABLE_FRACTION的設置，dk_nentries只佔dk_size的2/3，因此新的字典迭代次數少了1/3。在dict_items()函數中的迭代操做速度變快也是一樣的緣由。

如今再來看看 [Python-Dev] More compact dictionaries with faster iteration 裏面的這幾段話：

In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses.

Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion).

With the reduced memory footprint, we can also expect better cache utilization.

源碼觀後感

Python 的字典實現就是一套 tradeoff 的藝術，有太多的東西值得深思：

使用空間佔申請空間的比重
哈希函數和探測函數的選用
初始化須要申請的最小空間
字典擴容時擴到多少
元素分佈對 CPU 緩存的影響

目前 Python 裏的各個參數都是經過大量測試獲得的，考慮的場景很全面。然而，tradeoff 的藝術，也包括針對特定應用場景優化，若是能根據實際業務場景優化 Python 參數，性能仍是能夠提升的。

此外，幾個性能優化的點：

緩存池只緩存小對象，大容量的字典的建立和擴容每次都要從新申請內存。多小算小呢？

#define PyDict_MINSIZE 8

8 allows dicts with no more than 5 active entries.
鑑於lookdict_unicode()函數的存在，儘可能用字符串做爲key

參考./Objects/dictnotes.txt及./Objects/dictobject.c裏的部分註釋。

參考資料

附錄

dict 擴容源碼分析

擴容操做發生在元素插入的時候，當mp->ma_keys->dk_usable <= 0的時候，就對字典擴容，新容量使用GROWTH_RATE宏計算。dictresize()函數處理"combined"和"splitted"兩種狀況，須要分開看。

/* Python 3.6.0 */
/* ./Objects/dictobject.c */
/* GROWTH_RATE. Growth rate upon hitting maximum load.
 * Currently set to used*2 + capacity/2.
 * This means that dicts double in size when growing without deletions,
 * but have more head room when the number of deletions is on a par with the
 * number of insertions.
 * Raising this to used*4 doubles memory consumption depending on the size of
 * the dictionary, but results in half the number of resizes, less effort to
 * resize.
 * GROWTH_RATE was set to used*4 up to version 3.2.
 * GROWTH_RATE was set to used*2 in version 3.3.0
 */
#define GROWTH_RATE(d) (((d)->ma_used*2)+((d)->ma_keys->dk_size>>1))

/*
Restructure the table by allocating a new table and reinserting all
items again.  When entries have been deleted, the new table may
actually be smaller than the old one.
If a table is split (its keys and hashes are shared, its values are not),
then the values are temporarily copied into the table, it is resized as
a combined table, then the me_value slots in the old table are NULLed out.
After resizing a table is always combined,
but can be resplit by make_keys_shared().
*/
static int
dictresize(PyDictObject *mp, Py_ssize_t minsize)
{
    Py_ssize_t i, newsize;
    PyDictKeysObject *oldkeys;
    PyObject **oldvalues;
    PyDictKeyEntry *ep0;

    /* Find the smallest table size > minused. */
    /* 1. 計算新大小 */
    for (newsize = PyDict_MINSIZE;
         newsize < minsize && newsize > 0;
         newsize <<= 1)
        ;
    if (newsize <= 0) {
        PyErr_NoMemory();
        return -1;
    }
    /* 2. 申請新的 PyDictKeysObject 對象 */
    oldkeys = mp->ma_keys;
    oldvalues = mp->ma_values;
    /* Allocate a new table. */
    mp->ma_keys = new_keys_object(newsize);
    if (mp->ma_keys == NULL) {
        mp->ma_keys = oldkeys;
        return -1;
    }
    // New table must be large enough.
    assert(mp->ma_keys->dk_usable >= mp->ma_used);
    if (oldkeys->dk_lookup == lookdict)
        mp->ma_keys->dk_lookup = lookdict;
    /* 3. 元素搬遷 */
    mp->ma_values = NULL;
    ep0 = DK_ENTRIES(oldkeys);
    /* Main loop below assumes we can transfer refcount to new keys
     * and that value is stored in me_value.
     * Increment ref-counts and copy values here to compensate
     * This (resizing a split table) should be relatively rare */
    if (oldvalues != NULL) {
        /* 3.1 splitted table 轉換成 combined table */
        for (i = 0; i < oldkeys->dk_nentries; i++) {
            if (oldvalues[i] != NULL) {
                Py_INCREF(ep0[i].me_key);	// 要複製key，而原來的key也要用，因此增長引用計數
                ep0[i].me_value = oldvalues[i];
            }
        }
    }
    /* Main loop */
    for (i = 0; i < oldkeys->dk_nentries; i++) {
        PyDictKeyEntry *ep = &ep0[i];
        if (ep->me_value != NULL) {
            insertdict_clean(mp, ep->me_key, ep->me_hash, ep->me_value);
        }
    }
    mp->ma_keys->dk_usable -= mp->ma_used;
    /* 4. 清理舊值 */
    if (oldvalues != NULL) {
        /* NULL out me_value slot in oldkeys, in case it was shared */
        for (i = 0; i < oldkeys->dk_nentries; i++)
            ep0[i].me_value = NULL;
        DK_DECREF(oldkeys);
        if (oldvalues != empty_values) {
            free_values(oldvalues);
        }
    }
    else {
        assert(oldkeys->dk_lookup != lookdict_split);
        assert(oldkeys->dk_refcnt == 1);
        DK_DEBUG_DECREF PyObject_FREE(oldkeys);
    }
    return 0;
}

在分析函數內容前，先看下函數前面的說明：

Restructure the table by allocating a new table and reinserting all items again. When entries have been deleted, the new table may actually be smaller than the old one.
If a table is split (its keys and hashes are shared, its values are not), then the values are temporarily copied into the table, it is resized as a combined table, then the me_value slots in the old table are NULLed out. After resizing a table is always combined, but can be resplit by make_keys_shared().

這段說明告訴咱們 2 件重要的事情：

新的字典可能比舊的小，由於舊字典可能存在一些刪除的entry。(字典刪除元素後，爲了保持探測序列不斷開，元素狀態轉爲dummy，建立新字典的時候去掉了這些dummy狀態的元素)

儘管如此，~~爲了偷懶，~~我仍然把這個操做稱爲「擴容」。
「splitted」模式的字典通過擴容會永遠變成"combined"模式，能夠用make_keys_shared()函數從新調整爲"splitted"模式。擴容操做會把原來的分離的values拷貝到entry裏。

我把這個函數分紅了 4 個步驟：

計算新大小

程序從新計算了字典大小，但是參數Py_ssize_t minsize不是字典大小嗎？爲何要從新計算？

minsize顧名思義，指定了調用者指望的字典大小，不過字典大小必須是 2 的整數次冪，因此從新算了下。翻看new_keys_object()函數，也會發現函數開頭有這麼一句: assert(IS_POWER_OF_2(size))，這是硬性要求，其緣由已經在介紹數據結構的時候說過了，參考dk_size的說明。
申請新的 PyDictKeysObject 對象

"combined"模式下，須要擴容的部分是PyDictKeysObject對象裏面的dk_indices和dk_entries，程序並無直接擴容這部分，由於dk_indices和dk_entries不是指針，它們佔用了PyDictKeysObject這個結構體後面連續的內存區域，因此直接從新申請了新的PyDictKeysObject對象。

"splitted"模式下，原本還須要額外擴容ma_values，不過由於擴容使字典轉換爲"combined"模式，因此實際上不須要擴容ma_values，直接申請新的PyDictKeysObject對象，把ma_values轉移到dk_entries裏面，再把ma_values指向NULL就好。

好奇心又來了，爲何不把dk_indices和dk_entries設置成指針，指向獨立的數組呢？那樣不就能夠用realloc之類的函數擴容數組了嗎？同時也不用從新申請PyDictKeysObject對象，也不用手動複製數組元素了。

這個問題在網上和源碼裏都沒找到答案，我就本身瞎猜一下吧。假如我如今換用指針，這兩個數組在結構體外部申請獨立的空間，那麼會面臨 2 個問題：
1. 代碼分散。原本只有一個PyDictKeysObject對象，如今又多了 2 個外部數組，代碼裏除了添加相應的內存管理代碼，還須要在每一個函數裏檢測*dk_indices指針和*dk_entries指針是否爲空；
2. 頻繁的內存申請釋放帶來性能問題。如今的緩存池在PyDictKeysObject對象釋放的時候把對象加入緩存，並不當即銷燬，原來的dk_indices和dk_entries都是結構體內部的數組，能夠跟着結構體一塊兒緩存，而換成指針的話就不行了。要解決這個問題，就要給外部數組單獨加緩存池，這樣又致使了代碼分散的問題。
也不能說哪一種方法就必定好或者必定差，這是一個 tradeoff 的問題，時間，空間，可維護性，不可兼得。

魚與熊掌不可兼得。 --《魚我所欲也》

Newton's third law. You got to leave something behind. --《Interstellar》
元素搬遷
1. splitted table 轉換成 combined table
  
  這一步把ma_values轉移到dk_entries裏面的me_value，這樣後面就能夠按照"combined"模式操做這個 splitted table 了，操做完後，再把dk_entries裏面的me_value還原。注意Py_INCREF(ep0[i].me_key)操做，即給每一個key增長引用計數，爲何要這麼作呢？緣由還獲得insertdict_clean()函數裏去找。
```
/* Python 3.6.0 */
/* ./Objects/dictobject.c */
/*
Internal routine used by dictresize() to insert an item which is
known to be absent from the dict.  This routine also assumes that
the dict contains no deleted entries.  Besides the performance benefit,
using insertdict() in dictresize() is dangerous (SF bug #1456209).
Note that no refcounts are changed by this routine; if needed, the caller
is responsible for incref'ing `key` and `value`.
Neither mp->ma_used nor k->dk_usable are modified by this routine; the caller
must set them correctly
*/
static void
insertdict_clean(PyDictObject *mp, PyObject *key, Py_hash_t hash,
                 PyObject *value)
{
    size_t i;
    PyDictKeysObject *k = mp->ma_keys;
    size_t mask = (size_t)DK_SIZE(k)-1;
    PyDictKeyEntry *ep0 = DK_ENTRIES(mp->ma_keys);
    PyDictKeyEntry *ep;
    ...
    i = hash & mask;
    /* 探測處理哈希碰撞 */
    for (size_t perturb = hash; dk_get_index(k, i) != DKIX_EMPTY;) {
        perturb >>= PERTURB_SHIFT;
        i = mask & ((i << 2) + i + perturb + 1);
    }
    /* 修改 PyDictKeysObject 對象參數 */
    ep = &ep0[k->dk_nentries];	// 定位到 dk_entries 數組
    assert(ep->me_value == NULL);
    dk_set_index(k, i, k->dk_nentries);	// 填充 dk_indices 數組
    k->dk_nentries++;
    /* 填充 dk_entries 數組 */
    ep->me_key = key;
    ep->me_hash = hash;
    ep->me_value = value;
}
```
  這個函數比insertdict()函數更快，它的來歷要參考 issue1456209 。留意函數前面說明註釋裏的這句話：
  
  Note that no refcounts are changed by this routine; if needed, the caller is responsible for incref'ing key and value.
  
  這個函數只是複製了值，並不改變任何引用計數。看到這裏就明白了，舊的PyDictKeysObject對象裏面的key複製到新申請的PyDictKeysObject對象裏去的時候，引用計數應該加一。
  
  那麼問題又來了，爲何value的引用計數沒有增長呢？別忘了如今正在操做 split table, 舊的PyDictKeysObject對象是不少PyDictObject共用的，因此key也是共用的，爲了避免影響別的PyDictObject對象，須要把key複製到新PyDictKeysObject對象裏；而oldvalues = mp->ma_values在PyDictObject對象裏，是私有的，移動到新PyDictKeysObject對象裏便可，不需保留原值，因此不須要修改引用計數。
  
  爲了便於理解，打個比方：我抄李華的做業，抄完之後，李華的做業要還給李華，他也要交，因而做業的引用計數增長了個人一份，這就是複製key的狀況。而我轉念一想，抄得太像會被老師發現，因此本身又從新改抄了部份內容，以前抄的扔了就行，因此雖然我寫了 2 遍做業，可是最終只上交 1 份，做業的引用計數不變，這就是移動value的狀況。
  
  爲了更加便於理解，再說簡單一點：key是複製，引用計數+1；value是移動，引用計數+0
清理舊值

"combined"模式下，直接釋放舊的PyDictKeysObject對象；

"splitted"模式下，須要還原舊的PyDictKeysObject對象裏的dk_entries裏的me_value爲NULL，緣由參考 3.1 裏面的第一句話。最後釋放ma_values數組。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。