Memory Management and Circular References in Python

-- [since Python 3.4, circular references are handled much better](http://engineering.hearsaysocial.com/2013/06/16/circular-references-in-python/#comment-2882030670)

Nice post. Note that starting from Python 3.4, circular references are handled much better (docs imply it should be
rare that they are not collected -- but don't give specifics about how to make that happen). For example the
example you give is no longer a problem in Python 3.5 (probably not in 3.4 either, but can't test it right now).

前言

用像 Python, Ruby 這樣的解釋型語言編程很方便的一個方面就是,一般狀況下,你能夠避免處理內存管理相關的事情。然而,有一個衆所周知的狀況 Python 必定會有內存泄漏,這就是當你在對象建立中聲明瞭一個循環引用,並且在類聲明中實現了一個自定義的 __del__ 解構方法。例如,考慮以下例子:html

One of the more convenient aspects of writing code in interpreted languages such as Python or Ruby is that you normally can avoid dealing with memory management. However, one known case where Python will definitely leak memory is when you declare circular references in your object declarations and implement a custom __del__ destructor method in one these classes. For instance, consider the following example:python

class A(object):
    def __init__(self, b_instance):
      self.b = b_instance

class B(object):
    def __init__(self):
        self.a = A(self)
    def __del__(self):
        print "die"

def test():
    b = B()

test()

當函數 test() 被調用時,它聲明瞭一個對象 B,在 B 的 __init__ 函數中,把本身當成變量傳給了 A,A 而後在 __init__ 函數中聲明瞭對 B 的引用,這就形成了一個循環引用。通常狀況下,python 的垃圾收集器會被用於檢測上面這樣的循環引用,並刪除掉它們。然而,由於自定義的 ___del__ 方法,垃圾收集器會把這個循環引用相關對象標記爲 「uncollectable」。從設計上說,垃圾收集器並不知道循環引用對象的銷燬順序,因此也就不會去處理它們。你能夠經過強制垃圾收集器運行,並檢查 gc.garbage 列表裏有什麼來驗證上述結論。web

When the function test() is invoked, it declares an instance of B, which passes itself to A, which then sets a reference to B, resulting in a circular reference. Normally Python's garbage collector, which is used to detect these types of cyclic references, would remove it. However, because of the custom destructor (the __del__ method), it marks this item as "uncollectable". By design, it doesn't know the order in which to destroy the objects, so leaves them alone (see Python's garbage collection documentation for more background). You can verify this aspect by forcing the Python garbage collector to run and inspecting what is set inside the gc.garbage array:編程

import gc
gc.collect()
print gc.garbage
[<__main__.B object at 0x7f59f57c98d0>]

你能夠經過 objgraph 庫可視化這些循環引用。app

You can also see these circular references visually by using the objgraph library, which relies on Python's gc module to inspect the references to your Python objects. Note that objgraph library also deliberately plots the the custom __del__ methods in a red circle to spotlight a possible issue.ide

爲了不循環引用,你一般須要使用 weak reference,向 python 解釋器聲明:若是剩餘的引用屬於 weak reference,或者使用了 context manager 或 with 語法,那麼內存能夠被垃圾收集器回收並用於從新聲明對象。函數

To avoid circular references, you usually need to use weak references, declaring to the interpreter that the memory can be reclaimed for an object if the remaining references are of these types, or to use context managers and the with statement (for an example of this latter approach, see how it was solved for the happybase library).tornado

find_circular_references.py

# -*- encoding: utf-8 -*-
from __future__ import print_function

import gc
import traceback
import types
from tornado import web, ioloop, gen
from tornado.http1connection import HTTP1ServerConnection


def find_circular_references(garbage=None):
    """
    從 garbage 中尋找循環引用
    """
    def inner(level):
        """
        處理內層的數據
        """
        for item in level:
            item_id = id(item)
            if item_id not in garbage_ids:
                continue
            if item_id in visited_ids:
                continue
            if item_id in stack_ids:
                candidate = stack[stack.index(item):]
                candidate.append(item)
                found.append(candidate)
                continue

            stack.append(item)
            stack_ids.add(item_id)
            inner(gc.get_referents(item))
            stack.pop()
            stack_ids.remove(item_id)
            visited_ids.add(item_id)

    ######### 開始初始化 ########

    # 獲取傳入的 garbage 或者經過 gc 模塊獲取 garbage 列表
    garbage = garbage or gc.garbage

    # 已經找到的循環引用列表 type: list of list
    found = []

    # 存放 item 的堆
    stack = []

    # 存放 item_id 的 set
    stack_ids = set()

    # 保存 garbage 裏每一個對象的 id
    garbage_ids = set(map(id, garbage))

    # 保存 visited item 的 id
    visited_ids = set()

    ######## 初始化結束 ########

    # 進入遞歸函數 inner
    inner(garbage)
    inner = None
    return found


class CollectHandler(web.RequestHandler):
    @gen.coroutine
    def get(self):
        # collect_result = None
        collect_result = gc.collect()
        garbage = gc.garbage
        # for i in garbage[:5]:
        #     print(gc.get_referents(i), "\r\n")
        self.write("Collected: {}\n".format(collect_result))
        self.write("Garbage: {}\n".format(len(gc.garbage)))
        for circular in find_circular_references():
            print('\n==========\n Circular \n==========')
            for item in circular:
                print('    ', repr(item))
            for item in circular:
                if isinstance(item, types.FrameType):
                    print('\nLocals:', item.f_locals)
                    print('\nTraceback:', repr(item))
                    traceback.print_stack(item)

class DummyHandler(web.RequestHandler):
    @gen.coroutine
    def get(self):
        self.write('ok\n')
        self.finish()

application = web.Application([
    (r'/dummy/', DummyHandler),
    (r'/collect/', CollectHandler),
], debug=True)

if __name__ == "__main__":
    gc.disable()
    gc.collect()
    gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_LEAK)
    print('GC disabled')

    print("Start on 8888")
    application.listen(8888)
    ioloop.IOLoop.current().start()
相關文章
相關標籤/搜索