Python內存管理機制

時間 2019-11-10

原文原文鏈接

Python的內存管理機制：引入計數、垃圾回收、內存池機制java

1、引入計數python

一、變量與對象git

In sum, variables are created when assigned, can reference any type of object, and must
be assigned before they are referenced. This means that you never need to declare names
used by your script, but you must initialize names before you can update them; counters,
for example, must be initialized to zero before you can add to them.

變量賦值的時候才建立，它能夠指向（引用）任何類型的對象
- python裏每個東西都是對象，它們的核心就是一個結構體：PyObject
變量必須先賦值，再引用。
- 好比，你定義一個計數器，你必須初始化成0，而後才能自增。
每一個對象都包含兩個頭部字段（類型標識符和引用計數器）

關係圖以下：github

　Names and objects after running the assignment a = 3. Variable a becomes a reference to
the object 3. Internally, the variable is really a pointer to the object’s memory space created by running
the literal expression 3.express

These links from variables to objects are called references in Python—that is, a reference
is a kind of association, implemented as a pointer in memory.1 Whenever the variables
are later used (i.e., referenced), Python automatically follows the variable-to-object
links. This is all simpler than the terminology may imply. In concrete terms:

Variables are entries in a system table, with spaces for links to objects.
Objects are pieces of allocated memory, with enough space to represent the values for which they stand.

References are automatically followed pointers from variables to objects.
objects have two header fields, a type designator and a reference counter.

緩存

In Python, things work more simply.
Names have no types; as stated earlier, types live with objects, not names. In the preceding
listing, we’ve simply changed a to reference different objects. Because variables
have no type, we haven’t actually changed the type of the variable a; we’ve simply made
the variable reference a different type of object. In fact, again, all we can ever say about
a variable in Python is that it references a particular object at a particular point in time.

　　變量名沒有類型，類型屬於對象（由於變量引用對象，因此類型隨對象），在Python中，變量是一種特定類型對象在一個特定的時間點的引用。app

二、共享引用ide

>>> a = 3
>>> b = a
>>>
>>> id(a)
1747479616
>>> id(b)
1747479616
>>>
>>> hex(id(a))
'0x68286c40'
>>> hex(id(b))
'0x68286c40'
>>>

This scenario in Python—with multiple names referencing the same object—is usually
called a shared reference (and sometimes just a shared object). Note that the names a
and b are not linked to each other directly when this happens; in fact, there is no way
to ever link a variable to another variable in Python. 
Rather, both variables point to the same object via their references.

一、id() 是 python 的內置函數，用於返回對象的標識，即對象的內存地址。函數

>>> help(id)
Help on built-in function id in module builtins:

id(obj, /)
    Return the identity of an object.
    
    This is guaranteed to be unique among simultaneously existing objects.
    (CPython uses the object's memory address.)

二、引用所指判斷性能

　　經過is進行引用所指判斷，is是用來判斷兩個引用所指的對象是否相同。

整數

>>> a = 256
>>> b = 256
>>> a is b
True
>>> c = 257
>>> d = 257
>>> c is d
False
>>>

短字符串

>>> e = "Explicit"
>>> f = "Explicit"
>>> e is f
True
>>>

長字符串

>>> g = "Beautiful is better"
>>> h = "Beautiful is better"
>>> g is h
False
>>>

列表

>>> lst1 = [1, 2, 3]
>>> lst2 = [1, 2, 3]
>>> lst1 is lst2
False
>>>

由運行結果可知：

　　一、Python緩存了整數和短字符串，所以每一個對象在內存中只存有一份，引用所指對象就是相同的，即便使用賦值

　　　　語句，也只是創造新的引用，而不是對象自己；

　　二、Python沒有緩存長字符串、列表及其餘對象，能夠由多個相同的對象，可使用賦值語句建立出新的對象。

原理：

# 兩種優化機制： 代碼塊內的緩存機制, 小數據池。

# 代碼塊
代碼全都是基於代碼塊去運行的（比如校長給一個班發佈命令），一個文件就是一個代碼塊。
不一樣的文件就是不一樣的代碼塊。

# 代碼塊內的緩存機制
Python在執行同一個代碼塊的初始化對象的命令時，會檢查是否其值是否已經存在，若是存在，會將其重用。
換句話說：執行同一個代碼塊時，遇到初始化對象的命令時，他會將初始化的這個變量與值存儲在一個字典中，
在遇到新的變量時，會先在字典中查詢記錄，
若是有一樣的記錄那麼它會重複使用這個字典中的以前的這個值。
因此在文件執行時（同一個代碼塊）會把兩個變量指向同一個對象，
知足緩存機制則他們在內存中只存在一個，即：id相同。

注意：
# 機制只是在同一個代碼塊下！！！，才實行。
# 知足此機制的數據類型：int str bool。


# 小數據池（駐留機制，駐村機制，字符串的駐存機制，字符串的緩存機制等等）
不一樣代碼塊之間的優化。
# 適應的數據類型：str bool int
int： -5 ~256
str: 必定條件下的str知足小數據池。
bool值 所有。


# 總結：
若是你在同一個代碼塊中，用同一個代碼塊中的緩存機制。
若是你在不一樣代碼塊中，用小數據池。

# 優勢：
1，節省內存。
2，提高性能。

　　github上有詳細的例子，wtfpython

三、查看對象的引用計數

　　在Python中，每一個對象都有指向該對象的引用總數 --- 引用計數

　　查看對象的引用計數：sys.getrefcount()

　當對變量從新賦值時，它原來引用的值去哪啦？好比下面的例子，給 s 從新賦值字符串 apple，6 跑哪裏去啦？

>>> s = 6
>>> s = 'apple'

答案是：當變量從新賦值時，它原來指向的對象（若是沒有被其餘變量或對象引用的話）的空間可能被收回（垃圾回收）

The answer is that in Python, whenever a name is assigned to a new object, the space
held by the prior object is reclaimed if it is not referenced by any other name or object.
This automatic reclamation of objects’ space is known as garbage collection, and makes
life much simpler for programmers of languages like Python that support it.

普通引用

>>> import sys
>>> 
>>> a = "simple"
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> sys.getrefcount(b)
3
>>>

　　注意：當使用某個引用做爲參數，傳遞給getrefcount()時，參數實際上建立了一個臨時的引用。所以，getrefcount()所獲得的結果，會比指望的多1。

3、垃圾回收

　　當Python中的對象愈來愈多，佔據愈來愈大的內存，啓動垃圾回收(garbage collection)，將沒用的對象清除。

一、原理

　　當Python的某個對象的引用計數降爲0時，說明沒有任何引用指向該對象，該對象就成爲要被回收的垃圾。

好比某個新建對象，被分配給某個引用，對象的引用計數變爲1。若是引用被刪除，對象的引用計數爲0，那麼該對象就能夠被垃圾回收。

Internally, Python accomplishes this feat by keeping a counter in every object that keeps
track of the number of references currently pointing to that object. As soon as (and
exactly when) this counter drops to zero, the object’s memory space is automatically
reclaimed. In the preceding listing, we’re assuming that each time x is assigned to a new
object, the prior object’s reference counter drops to zero, causing it to be reclaimed.

The most immediately tangible benefit of garbage collection is that it means you can
use objects liberally without ever needing to allocate or free up space in your script.
Python will clean up unused space for you as your program runs. In practice, this
eliminates a substantial amount of bookkeeping code required in lower-level languages
such as C and C++.

二、解析del

　　del 可使對象的引用計數減 1，該表引用計數變爲0，用戶不可能經過任何方式接觸或者動用這個對象，當垃圾回收啓動時，Python掃描到這個引用計數爲0的對象，就將它所佔據的內存清空。

注意

　　一、垃圾回收時，Python不能進行其它的任務，頻繁的垃圾回收將大大下降Python的工做效率；

　　二、Python只會在特定條件下，自動啓動垃圾回收（垃圾對象少就不必回收）

　　三、當Python運行時，會記錄其中分配對象(object allocation)和取消分配對象(object deallocation)的次數。

　　當二者的差值高於某個閾值時，垃圾回收纔會啓動。

>>> import gc
>>> 
>>> gc.get_threshold() #gc模塊中查看垃圾回收閾值的方法
(700, 10, 10)
>>>

閾值分析：

　　700 便是垃圾回收啓動的閾值；

　　每10 次 0代垃圾回收，會配合 1次 1代的垃圾回收；而每10次1代的垃圾回收，纔會有1次的2代垃圾回收；

固然也是能夠手動啓動垃圾回收：

>>> gc.collect()       #手動啓動垃圾回收
52
>>> gc.set_threshold(666, 8, 9) # gc模塊中設置垃圾回收閾值的方法
>>>

何爲分代回收

Python將全部的對象分爲0，1，2三代；
全部的新建對象都是0代對象；
當某一代對象經歷過垃圾回收，依然存活，就被納入下一代對象。

分代技術是一種典型的以空間換時間的技術，這也正是java裏的關鍵技術。這種思想簡單點說就是：對象存在時間越長，越可能不是垃圾，應該越少去收集。
這樣的思想，能夠減小標記-清除機制所帶來的額外操做。分代就是將回收對象分紅數個代，每一個代就是一個鏈表（集合），代進行標記-清除的時間與代內對象
存活時間成正比例關係。
從上面代碼能夠看出python裏一共有三代，每一個代的threshold值表示該代最多容納對象的個數。默認狀況下，當0代超過700,或1，2代超過10，垃圾回收機制將觸發。
0代觸發將清理全部三代，1代觸發會清理1,2代，2代觸發後只會清理本身。

標記-清除

標記-清除機制，顧名思義，首先標記對象（垃圾檢測），而後清除垃圾（垃圾回收）。
首先初始全部對象標記爲白色，並肯定根節點對象（這些對象是不會被刪除），標記它們爲黑色（表示對象有效）。
將有效對象引用的對象標記爲灰色（表示對象可達，但它們所引用的對象還沒檢查），檢查完灰色對象引用的對象後，將灰色標記爲黑色。
重複直到不存在灰色節點爲止。最後白色結點都是須要清除的對象。

如何解決循環引用可能致使的內存泄露問題呢？

More on Python Garbage Collection

Technically speaking, Python’s garbage collection is based mainly upon reference counters,
as described here; however, it also has a component that detects and reclaims
objects with cyclic references in time. This component can be disabled if you’re sure
that your code doesn’t create cycles, but it is enabled by default.

Circular references are a classic issue in reference count garbage collectors. Because
references are implemented as pointers, it’s possible for an object to reference itself, or
reference another object that does. For example, exercise 3 at the end of Part I and its
solution in Appendix D show how to create a cycle easily by embedding a reference to
a list within itself (e.g., L.append(L)). The same phenomenon can occur for assignments
to attributes of objects created from user-defined classes. Though relatively rare, because
the reference counts for such objects never drop to zero, they must be treated
specially.

For more details on Python’s cycle detector, see the documentation for the gc module
in Python’s library manual. The best news here is that garbage-collection-based memory
management is implemented for you in Python, by people highly skilled at the task.

　　答案是：

弱引用使用weakref 模塊下的 ref 方法
強制把其中一個引用變成 None

import gc
import objgraph
import sys
import weakref


def quote_demo():
    class Person:
        pass

    p = Person()  # 1
    print(sys.getrefcount(p))  # 2  first

    def log(obj):
        # 4  second 函數執行才計數，執行完釋放
        print(sys.getrefcount(obj))

    log(p)  # 3

    p2 = p  # 2
    print(sys.getrefcount(p))  # 3
    del p2
    print(sys.getrefcount(p))  # 3 - 1 = 2


def circle_quote():
    # 循環引用
    class Dog:
        pass

    class Person:
        pass

    p = Person()
    d = Dog()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))

    p.pet = d
    d.master = p

    # 刪除 p, d以後, 對應的對象是否被釋放掉
    del p
    del d

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def solve_cirecle_quote():
    # 1. 定義了兩個類
    class Person:
        def __del__(self):
            print("Person對象, 被釋放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog對象, 被釋放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = p

    p.pet = None  # 強制置 None
    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def sovle_circle_quote_with_weak_ref():
    # 1. 定義了兩個類
    class Person:
        def __del__(self):
            print("Person對象, 被釋放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog對象, 被釋放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = weakref.ref(p)

    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


if __name__ == "__main__":
    quote_demo()
    circle_quote()
    solve_cirecle_quote()
    sovle_circle_quote_with_weak_ref()

4、內存池機制

　　Python中有分爲大內存和小內存：（256K爲界限分大小內存）