Python中的高級數據結構

時間 2019-12-05

原文原文鏈接

　數據結構

　　數據結構的概念很好理解，就是用來將數據組織在一塊兒的結構。換句話說，數據結構是用來存儲一系列關聯數據的東西。在Python中有四種內建的數據結構，分別是List、Tuple、Dictionary以及Set。大部分的應用程序不須要其餘類型的數據結構，但如果真須要也有不少高級數據結構可供選擇，例如Collection、Array、Heapq、Bisect、Weakref、Copy以及Pprint。本文將介紹這些數據結構的用法，看看它們是如何幫助咱們的應用程序的。node

　　關於四種內建數據結構的使用方法很簡單，而且網上有不少參考資料，所以本文將不會討論它們。python

　　1. Collections

　　collections模塊包含了內建類型以外的一些有用的工具，例如Counter、defaultdict、OrderedDict、deque以及nametuple。其中Counter、deque以及defaultdict是最經常使用的類。算法

　　1.1 Counter()

　　若是你想統計一個單詞在給定的序列中一共出現了多少次，諸如此類的操做就能夠用到Counter。來看看如何統計一個list中出現的item次數：express

 
   
    
      
      
        from 
        collections  
        import 
        Counter 
       

           
       
 
        li  
        = 
        [ 
        "Dog" 
        ,  
        "Cat" 
        ,  
        "Mouse" 
        ,  
        42 
        ,  
        "Dog" 
        ,  
        42 
        ,  
        "Cat" 
        ,  
        "Dog" 
        ] 
       
 
        a  
        = 
        Counter(li) 
       
 
        print 
        a  
        # Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1}) 
       
 
    
 
   
 

　　若要統計一個list中不一樣單詞的數目，能夠這麼用：api

 
        from 
        collections  
        import 
        Counter 
       
        li  
        = 
        [ 
        "Dog" 
        ,  
        "Cat" 
        ,  
        "Mouse" 
        ,  
        42 
        ,  
        "Dog" 
        ,  
        42 
        ,  
        "Cat" 
        ,  
        "Dog" 
        ] 
       
        a  
        = 
        Counter(li) 
       
        print 
        a  
        # Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1}) 
       
        print 
        len 
        ( 
        set 
        (li))  
        # 4

　　若是須要對結果進行分組，能夠這麼作：數組

 
        from 
        collections  
        import 
        Counter 
       
        li  
        = 
        [ 
        "Dog" 
        ,  
        "Cat" 
        ,  
        "Mouse" 
        , 
        "Dog" 
        , 
        "Cat" 
        ,  
        "Dog" 
        ] 
       
        a  
        = 
        Counter(li) 
       
        print 
        a  
        # Counter({'Dog': 3, 'Cat': 2, 'Mouse': 1}) 
       
        print 
        "{0} : {1}" 
        . 
        format 
        (a.values(),a.keys())   
        # [1, 3, 2] : ['Mouse', 'Dog', 'Cat'] 
       
        print 
        (a.most_common( 
        3 
        ))  
        # [('Dog', 3), ('Cat', 2), ('Mouse', 1)]

　　如下的代碼片斷找出一個字符串中出現頻率最高的單詞，並打印其出現次數。安全

 
        import 
        re 
       
        from 
        collections  
        import 
        Counter 
       
        string  
        = 
        """   Lorem ipsum dolor sit amet, consectetur 
       
        adipiscing elit. Nunc ut elit id mi ultricies 
       
        adipiscing. Nulla facilisi. Praesent pulvinar, 
       
        sapien vel feugiat vestibulum, nulla dui pretium orci, 
       
        non ultricies elit lacus quis ante. Lorem ipsum dolor 
       
        sit amet, consectetur adipiscing elit. Aliquam 
       
        pretium ullamcorper urna quis iaculis. Etiam ac massa 
       
        sed turpis tempor luctus. Curabitur sed nibh eu elit 
       
        mollis congue. Praesent ipsum diam, consectetur vitae 
       
        ornare a, aliquam a nunc. In id magna pellentesque 
       
        tellus posuere adipiscing. Sed non mi metus, at lacinia 
       
        augue. Sed magna nisi, ornare in mollis in, mollis 
       
        sed nunc. Etiam at justo in leo congue mollis. 
       
        Nullam in neque eget metus hendrerit scelerisque 
       
        eu non enim. Ut malesuada lacus eu nulla bibendum 
       
        id euismod urna sodales.  """ 
       
        words  
        = 
        re.findall(r 
        '\w+' 
        , string)  
        #This finds words in the document 
       
        lower_words  
        = 
        [word.lower()  
        for 
        word  
        in 
        words]  
        #lower all the words 
       
        word_counts  
        = 
        Counter(lower_words)  
        #counts the number each time a word appears 
       
        print 
        word_counts 
       
        # Counter({'elit': 5, 'sed': 5, 'in': 5, 'adipiscing': 4, 'mollis': 4, 'eu': 3,  
       
        # 'id': 3, 'nunc': 3, 'consectetur': 3, 'non': 3, 'ipsum': 3, 'nulla': 3, 'pretium': 
       
        # 2, 'lacus': 2, 'ornare': 2, 'at': 2, 'praesent': 2, 'quis': 2, 'sit': 2, 'congue': 2, 'amet': 2,  
       
        # 'etiam': 2, 'urna': 2, 'a': 2, 'magna': 2, 'lorem': 2, 'aliquam': 2, 'ut': 2, 'ultricies': 2, 'mi': 2,  
       
        # 'dolor': 2, 'metus': 2, 'ac': 1, 'bibendum': 1, 'posuere': 1, 'enim': 1, 'ante': 1, 'sodales': 1, 'tellus': 1, 
       
        # 'vitae': 1, 'dui': 1, 'diam': 1, 'pellentesque': 1, 'massa': 1, 'vel': 1, 'nullam': 1, 'feugiat': 1, 'luctus': 1,  
       
        # 'pulvinar': 1, 'iaculis': 1, 'hendrerit': 1, 'orci': 1, 'turpis': 1, 'nibh': 1, 'scelerisque': 1, 'ullamcorper': 1, 
       
        # 'eget': 1, 'neque': 1, 'euismod': 1, 'curabitur': 1, 'leo': 1, 'sapien': 1, 'facilisi': 1, 'vestibulum': 1, 'nisi': 1,  
       
        # 'justo': 1, 'augue': 1, 'tempor': 1, 'lacinia': 1, 'malesuada': 1})

　　1.2 Deque

　　Deque是一種由隊列結構擴展而來的雙端隊列(double-ended queue)，隊列元素可以在隊列兩端添加或刪除。所以它還被稱爲頭尾鏈接列表(head-tail linked list)，儘管叫這個名字的還有另外一個特殊的數據結構實現。數據結構

　　Deque支持線程安全的，通過優化的append和pop操做，在隊列兩端的相關操做都可以達到近乎O(1)的時間複雜度。雖然list也支持相似的操做，可是它是對定長列表的操做表現很不錯，而當遇到pop(0)和insert(0, v)這樣既改變了列表的長度又改變其元素位置的操做時，其複雜度就變爲O(n)了。app

　　來看看相關的比較結果：socket

 
        import 
        time 
       
        from 
        collections  
        import 
        deque 
       
        num  
        = 
        100000 
       
        def 
        append(c): 
       
        for 
        i  
        in 
        range 
        (num): 
       
        c.append(i) 
       
        def 
        appendleft(c): 
       
        if 
        isinstance 
        (c, deque): 
       
        for 
        i  
        in 
        range 
        (num): 
       
        c.appendleft(i) 
       
        else 
        : 
       
        for 
        i  
        in 
        range 
        (num): 
       
        c.insert( 
        0 
        , i) 
       
        def 
        pop(c): 
       
        for 
        i  
        in 
        range 
        (num): 
       
        c.pop() 
       
        def 
        popleft(c): 
       
        if 
        isinstance 
        (c, deque): 
       
        for 
        i  
        in 
        range 
        (num): 
       
        c.popleft() 
       
        else 
        : 
       
        for 
        i  
        in 
        range 
        (num): 
       
        c.pop( 
        0 
        ) 
       
        for 
        container  
        in 
        [deque,  
        list 
        ]: 
       
        for 
        operation  
        in 
        [append, appendleft, pop, popleft]: 
       
        c  
        = 
        container( 
        range 
        (num)) 
       
        start  
        = 
        time.time() 
       
        operation(c) 
       
        elapsed  
        = 
        time.time()  
        - 
        start 
       
        print 
        "Completed {0}/{1} in {2} seconds: {3} ops/sec" 
        . 
        format 
        ( 
       
        container.__name__, operation.__name__, elapsed, num  
        / 
        elapsed) 
       
        # Completed deque/append in 0.0250000953674 seconds: 3999984.74127 ops/sec 
       
        # Completed deque/appendleft in 0.0199999809265 seconds: 5000004.76838 ops/sec 
       
        # Completed deque/pop in 0.0209999084473 seconds: 4761925.52225 ops/sec 
       
        # Completed deque/popleft in 0.0199999809265 seconds: 5000004.76838 ops/sec 
       
        # Completed list/append in 0.0220000743866 seconds: 4545439.17637 ops/sec 
       
        # Completed list/appendleft in 21.3209998608 seconds: 4690.21155917 ops/sec 
       
        # Completed list/pop in 0.0240001678467 seconds: 4166637.52682 ops/sec 
       
        # Completed list/popleft in 4.01799988747 seconds: 24888.0046791 ops/sec

　　另外一個例子是執行基本的隊列操做：

 
        from 
        collections  
        import 
        deque 
       
        q  
        = 
        deque( 
        range 
        ( 
        5 
        )) 
       
        q.append( 
        5 
        ) 
       
        q.appendleft( 
        6 
        ) 
       
        print 
        q 
       
        print 
        q.pop() 
       
        print 
        q.popleft() 
       
        print 
        q.rotate( 
        3 
        ) 
       
        print 
        q 
       
        print 
        q.rotate( 
        - 
        1 
        ) 
       
        print 
        q 
       
        # deque([6, 0, 1, 2, 3, 4, 5]) 
       
        # 5 
       
        # 6 
       
        # None 
       
        # deque([2, 3, 4, 0, 1]) 
       
        # None 
       
        # deque([3, 4, 0, 1, 2])

　　譯者注:rotate是隊列的旋轉操做，Right rotate(正參數)是將右端的元素移動到左端，而Left rotate(負參數)則相反。

　　1.3 Defaultdict

　　這個類型除了在處理不存在的鍵的操做以外與普通的字典徹底相同。當查找一個不存在的鍵操做發生時，它的default_factory會被調用，提供一個默認的值，而且將這對鍵值存儲下來。其餘的參數同普通的字典方法dict()一致，一個defaultdict的實例同內建dict同樣擁有一樣地操做。

　　defaultdict對象在當你但願使用它存放追蹤數據的時候頗有用。舉個例子，假定你但願追蹤一個單詞在字符串中的位置，那麼你能夠這麼作：

 
        from 
        collections  
        import 
        defaultdict 
       
        s  
        = 
        "the quick brown fox jumps over the lazy dog" 
       
        words  
        = 
        s.split() 
       
        location  
        = 
        defaultdict( 
        list 
        ) 
       
        for 
        m, n  
        in 
        enumerate 
        (words): 
       
        location[n].append(m) 
       
        print 
        location 
       
        # defaultdict(<type 'list'>, {'brown': [2], 'lazy': [7], 'over': [5], 'fox': [3], 
       
        # 'dog': [8], 'quick': [1], 'the': [0, 6], 'jumps': [4]})

　　是選擇lists或sets與defaultdict搭配取決於你的目的，使用list可以保存你插入元素的順序，而使用set則不關心元素插入順序，它會幫助消除重複元素。

 
        from 
        collections  
        import 
        defaultdict 
       
        s  
        = 
        "the quick brown fox jumps over the lazy dog" 
       
        words  
        = 
        s.split() 
       
        location  
        = 
        defaultdict( 
        set 
        ) 
       
        for 
        m, n  
        in 
        enumerate 
        (words): 
       
        location[n].add(m) 
       
        print 
        location 
       
        # defaultdict(<type 'set'>, {'brown': set([2]), 'lazy': set([7]),  
       
        # 'over': set([5]), 'fox': set([3]), 'dog': set([8]), 'quick': set([1]),  
       
        # 'the': set([0, 6]), 'jumps': set([4])})

　　另外一種建立multidict的方法：

 
        s  
        = 
        "the quick brown fox jumps over the lazy dog" 
       
        d  
        = 
        {} 
       
        words  
        = 
        s.split() 
       
        for 
        key, value  
        in 
        enumerate 
        (words): 
       
        d.setdefault(key, []).append(value) 
       
        print 
        d 
       
        # {0: ['the'], 1: ['quick'], 2: ['brown'], 3: ['fox'], 4: ['jumps'], 5: ['over'], 6: ['the'], 7: ['lazy'], 8: ['dog']}

　　一個更復雜的例子：

 
        class 
        Example( 
        dict 
        ): 
       
        def 
        __getitem__( 
        self 
        , item): 
       
        try 
        : 
       
        return 
        dict 
        .__getitem__( 
        self 
        , item) 
       
        except 
        KeyError: 
       
        value  
        = 
        self 
        [item]  
        = 
        type 
        ( 
        self 
        )() 
       
        return 
        value 
       
        a  
        = 
        Example() 
       
        a[ 
        1 
        ][ 
        2 
        ][ 
        3 
        ]  
        = 
        4 
       
        a[ 
        1 
        ][ 
        3 
        ][ 
        3 
        ]  
        = 
        5 
       
        a[ 
        1 
        ][ 
        2 
        ][ 
        'test' 
        ]  
        = 
        6 
       
        print 
        a  
        # {1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}

　　2. Array

　　array模塊定義了一個很像list的新對象類型，不一樣之處在於它限定了這個類型只能裝一種類型的元素。array元素的類型是在建立並使用的時候肯定的。

　　若是你的程序須要優化內存的使用，而且你肯定你但願在list中存儲的數據都是一樣類型的，那麼使用array模塊很合適。舉個例子，若是須要存儲一千萬個整數，若是用list，那麼你至少須要160MB的存儲空間，然而若是使用array，你只須要40MB。但雖說可以節省空間，array上幾乎沒有什麼基本操做可以比在list上更快。

　　在使用array進行計算的時候，須要特別注意那些建立list的操做。例如，使用列表推導式(list comprehension)的時候，會將array整個轉換爲list，使得存儲空間膨脹。一個可行的替代方案是使用生成器表達式建立新的array。看代碼：

 
   
    
      
      
        import 
        array 
       

           
       
 
        a  
        = 
        array.array( 
        "i" 
        , [ 
        1 
        , 
        2 
        , 
        3 
        , 
        4 
        , 
        5 
        ]) 
       
 
        b  
        = 
        array.array(a.typecode, ( 
        2 
        * 
        x  
        for 
        x  
        in 
        a)) 
       
 
    
 
   
 

　　由於使用array是爲了節省空間，因此更傾向於使用in-place操做。一種更高效的方法是使用enumerate：

 
        import 
        array 
       
        a  
        = 
        array.array( 
        "i" 
        , [ 
        1 
        , 
        2 
        , 
        3 
        , 
        4 
        , 
        5 
        ]) 
       
        for 
        i, x  
        in 
        enumerate 
        (a): 
       
        a[i]  
        = 
        2 
        * 
        x

　　對於較大的array，這種in-place修改可以比用生成器建立一個新的array至少提高15%的速度。

　　那麼何時使用array呢？是當你在考慮計算的因素以外，還須要獲得一個像C語言裏同樣統一元素類型的數組時。

 
   
    
      
      
        import 
        array 
       
 
        from 
        timeit  
        import 
        Timer 
       

           
       
 
        def 
        arraytest(): 
       
 
             
        a  
        = 
        array.array( 
        "i" 
        , [ 
        1 
        ,  
        2 
        ,  
        3 
        ,  
        4 
        ,  
        5 
        ]) 
       
 
             
        b  
        = 
        array.array(a.typecode, ( 
        2 
        * 
        x  
        for 
        x  
        in 
        a)) 
       

           
       
 
        def 
        enumeratetest(): 
       
 
             
        a  
        = 
        array.array( 
        "i" 
        , [ 
        1 
        ,  
        2 
        ,  
        3 
        ,  
        4 
        ,  
        5 
        ]) 
       
 
             
        for 
        i, x  
        in 
        enumerate 
        (a): 
       
 
                 
        a[i]  
        = 
        2 
        * 
        x 
       

           
       
 
        if 
        __name__ 
        = 
        = 
        '__main__' 
        : 
       
 
             
        m  
        = 
        Timer( 
        "arraytest()" 
        ,  
        "from __main__ import arraytest" 
        ) 
       
 
             
        n  
        = 
        Timer( 
        "enumeratetest()" 
        ,  
        "from __main__ import enumeratetest" 
        ) 
       

           
       
 
             
        print 
        m.timeit()  
        # 5.22479210582 
       
 
             
        print 
        n.timeit()  
        # 4.34367196717 
       
 
    
 
   
 

　　3. Heapq

　　heapq模塊使用一個用堆實現的優先級隊列。堆是一種簡單的有序列表，而且置入了堆的相關規則。

　　堆是一種樹形的數據結構，樹上的子節點與父節點之間存在順序關係。二叉堆(binary heap)可以用一個通過組織的列表或數組結構來標識，在這種結構中，元素N的子節點的序號爲2*N+1和2*N+2(下標始於0)。簡單來講，這個模塊中的全部函數都假設序列是有序的，因此序列中的第一個元素(seq[0])是最小的，序列的其餘部分構成一個二叉樹，而且seq[i]節點的子節點分別爲seq[2*i+1]以及seq[2*i+2]。當對序列進行修改時，相關函數老是確保子節點大於等於父節點。

 
        import 
        heapq 
       
        heap  
        = 
        [] 
       
        for 
        value  
        in 
        [ 
        20 
        ,  
        10 
        ,  
        30 
        ,  
        50 
        ,  
        40 
        ]: 
       
        heapq.heappush(heap, value) 
       
        while 
        heap: 
       
        print 
        heapq.heappop(heap)

　　heapq模塊有兩個函數nlargest()和nsmallest()，顧名思義，讓咱們來看看它們的用法。

 
   
    
      
      
        import 
        heapq 
       

           
       
 
        nums  
        = 
        [ 
        1 
        ,  
        8 
        ,  
        2 
        ,  
        23 
        ,  
        7 
        ,  
        - 
        4 
        ,  
        18 
        ,  
        23 
        ,  
        42 
        ,  
        37 
        ,  
        2 
        ] 
       
 
        print 
        (heapq.nlargest( 
        3 
        , nums))  
        # Prints [42, 37, 23] 
       
 
        print 
        (heapq.nsmallest( 
        3 
        , nums))  
        # Prints [-4, 1, 2] 
       
 
    
 
   
 

　　兩個函數也可以經過一個鍵參數使用更爲複雜的數據結構，例如：

 
   
    
      
      
        import 
        heapq 
       

           
       
 
        portfolio  
        = 
        [ 
       
 
        { 
        'name' 
        :  
        'IBM' 
        ,  
        'shares' 
        :  
        100 
        ,  
        'price' 
        :  
        91.1 
        }, 
       
 
        { 
        'name' 
        :  
        'AAPL' 
        ,  
        'shares' 
        :  
        50 
        ,  
        'price' 
        :  
        543.22 
        }, 
       
 
        { 
        'name' 
        :  
        'FB' 
        ,  
        'shares' 
        :  
        200 
        ,  
        'price' 
        :  
        21.09 
        }, 
       
 
        { 
        'name' 
        :  
        'HPQ' 
        ,  
        'shares' 
        :  
        35 
        ,  
        'price' 
        :  
        31.75 
        }, 
       
 
        { 
        'name' 
        :  
        'YHOO' 
        ,  
        'shares' 
        :  
        45 
        ,  
        'price' 
        :  
        16.35 
        }, 
       
 
        { 
        'name' 
        :  
        'ACME' 
        ,  
        'shares' 
        :  
        75 
        ,  
        'price' 
        :  
        115.65 
        } 
       
 
        ] 
       
 
        cheap  
        = 
        heapq.nsmallest( 
        3 
        , portfolio, key 
        = 
        lambda 
        s: s[ 
        'price' 
        ]) 
       
 
        expensive  
        = 
        heapq.nlargest( 
        3 
        , portfolio, key 
        = 
        lambda 
        s: s[ 
        'price' 
        ]) 
       

           
       
 
        print 
        cheap 
       

           
       
 
        # [{'price': 16.35, 'name': 'YHOO', 'shares': 45}, 
       
 
        # {'price': 21.09, 'name': 'FB', 'shares': 200}, {'price': 31.75, 'name': 'HPQ', 'shares': 35}] 
       

           
       
 
        print 
        expensive 
       

           
       
 
        # [{'price': 543.22, 'name': 'AAPL', 'shares': 50}, {'price': 115.65, 'name': 'ACME',  
       
 
        # 'shares': 75}, {'price': 91.1, 'name': 'IBM', 'shares': 100}] 
       
 
    
 
   
 

　　來看看如何實現一個根據給定優先級進行排序，而且每次pop操做都返回優先級最高的元素的隊列例子。

 
        import 
        heapq 
       
        class 
        Item: 
       
        def 
        __init__( 
        self 
        , name): 
       
        self 
        .name  
        = 
        name 
       
        def 
        __repr__( 
        self 
        ): 
       
        return 
        'Item({!r})' 
        . 
        format 
        ( 
        self 
        .name) 
       
        class 
        PriorityQueue: 
       
        def 
        __init__( 
        self 
        ): 
       
        self 
        ._queue  
        = 
        [] 
       
        self 
        ._index  
        = 
        0 
       
        def 
        push( 
        self 
        , item, priority): 
       
        heapq.heappush( 
        self 
        ._queue, ( 
        - 
        priority,  
        self 
        ._index, item)) 
       
        self 
        ._index  
        + 
        = 
        1 
       
        def 
        pop( 
        self 
        ): 
       
        return 
        heapq.heappop( 
        self 
        ._queue)[ 
        - 
        1 
        ] 
       
        q  
        = 
        PriorityQueue() 
       
        q.push(Item( 
        'foo' 
        ),  
        1 
        ) 
       
        q.push(Item( 
        'bar' 
        ),  
        5 
        ) 
       
        q.push(Item( 
        'spam' 
        ),  
        4 
        ) 
       
        q.push(Item( 
        'grok' 
        ),  
        1 
        ) 
       
        print 
        q.pop()  
        # Item('bar') 
       
        print 
        q.pop()  
        # Item('spam') 
       
        print 
        q.pop()  
        # Item('foo') 
       
        print 
        q.pop()  
        # Item('grok')

　　4. Bisect

　　bisect模塊可以提供保持list元素序列的支持。它使用了二分法完成大部分的工做。它在向一個list插入元素的同時維持list是有序的。在某些狀況下，這比重複的對一個list進行排序更爲高效，而且對於一個較大的list來講，對每步操做維持其有序也比對其排序要高效。

　　假設你有一個range集合：

　　若是我想添加一個range (250, 400)，我可能會這麼作：

 
        import 
        bisect 
       
        a  
        = 
        [( 
        0 
        ,  
        100 
        ), ( 
        150 
        ,  
        220 
        ), ( 
        500 
        ,  
        1000 
        )] 
       
        bisect.insort_right(a, ( 
        250 
        , 
        400 
        )) 
       
        print 
        a  
        # [(0, 100), (150, 220), (250, 400), (500, 1000)]

　　咱們可使用bisect()函數來尋找插入點：

 
        import 
        bisect 
       
        a  
        = 
        [( 
        0 
        ,  
        100 
        ), ( 
        150 
        ,  
        220 
        ), ( 
        500 
        ,  
        1000 
        )] 
       
        bisect.insort_right(a, ( 
        250 
        , 
        400 
        )) 
       
        bisect.insort_right(a, ( 
        399 
        ,  
        450 
        )) 
       
        print 
        a  
        # [(0, 100), (150, 220), (250, 400), (500, 1000)] 
       
        print 
        bisect.bisect(a, ( 
        550 
        ,  
        1200 
        ))  
        # 5

　　bisect(sequence, item) => index 返回元素應該的插入點，但序列並不被修改。

 
        import 
        bisect 
       
        a  
        = 
        [( 
        0 
        ,  
        100 
        ), ( 
        150 
        ,  
        220 
        ), ( 
        500 
        ,  
        1000 
        )] 
       
        bisect.insort_right(a, ( 
        250 
        , 
        400 
        )) 
       
        bisect.insort_right(a, ( 
        399 
        ,  
        450 
        )) 
       
        print 
        a  
        # [(0, 100), (150, 220), (250, 400), (500, 1000)] 
       
        print 
        bisect.bisect(a, ( 
        550 
        ,  
        1200 
        ))  
        # 5 
       
        bisect.insort_right(a, ( 
        550 
        ,  
        1200 
        )) 
       
        print 
        a  
        # [(0, 100), (150, 220), (250, 400), (399, 450), (500, 1000), (550, 1200)]

　　新元素被插入到第5的位置。

　　5. Weakref

　　weakref模塊可以幫助咱們建立Python引用，卻不會阻止對象的銷燬操做。這一節包含了weak reference的基本用法，而且引入一個代理類。

　　在開始以前，咱們須要明白什麼是strong reference。strong reference是一個對對象的引用次數、生命週期以及銷燬時機產生影響的指針。strong reference如你所見，就是當你將一個對象賦值給一個變量的時候產生的：

　　在這種狀況下，這個列表有兩個strong reference，分別是a和b。在這兩個引用都被釋放以前，這個list不會被銷燬。

 
        class 
        Foo( 
        object 
        ): 
       
        def 
        __init__( 
        self 
        ): 
       
        self 
        .obj  
        = 
        None 
       
        print 
        'created' 
       
        def 
        __del__( 
        self 
        ): 
       
        print 
        'destroyed' 
       
        def 
        show( 
        self 
        ): 
       
        print 
        self 
        .obj 
       
        def 
        store( 
        self 
        , obj): 
       
        self 
        .obj  
        = 
        obj 
       
        a  
        = 
        Foo()  
        # created 
       
        b  
        = 
        a 
       
        del 
        a 
       
        del 
        b  
        # destroyed

　　Weak reference則是對對象的引用計數器不會產生影響。當一個對象存在weak reference時，並不會影響對象的撤銷。這就說，若是一個對象僅剩下weak reference，那麼它將會被銷燬。

　　你可使用weakref.ref函數來建立對象的weak reference。這個函數調用須要將一個strong reference做爲第一個參數傳給函數，而且返回一個weak reference。

 
        >>>  
        import 
        weakref 
       
        >>> a  
        = 
        Foo() 
       
        created 
       
        >>> b  
        = 
        weakref.ref(a) 
       
        >>> b

　　一個臨時的strong reference能夠從weak reference中建立，便是下例中的b()：

 
        >>> a  
        = 
        = 
        b()  
       
        True 
       
        >>> b().show() 
       
        None

　　請注意當咱們刪除strong reference的時候，對象將當即被銷燬。

 
        >>>  
        del 
        a 
       
        destroyed

　　若是試圖在對象被摧毀以後經過weak reference使用對象，則會返回None：

 
        >>> b()  
        is 
        None 
       
        True

　　如果使用weakref.proxy，就能提供相對於weakref.ref更透明的可選操做。一樣是使用一個strong reference做爲第一個參數而且返回一個weak reference，proxy更像是一個strong reference，但當對象不存在時會拋出異常。

 
        >>> a  
        = 
        Foo() 
       
        created 
       
        >>> b  
        = 
        weakref.proxy(a) 
       
        >>> b.store( 
        'fish' 
        ) 
       
        >>> b.show() 
       
        fish 
       
        >>>  
        del 
        a 
       
        destroyed 
       
        >>> b.show() 
       
        Traceback (most recent call last): 
       
        File 
        "", line  
        1 
        ,  
        in 
        ? 
       
        ReferenceError: weakly 
        - 
        referenced  
        object 
        no longer exists

　　完整的例子：

　　引用計數器是由Python的垃圾回收器使用的，當一個對象的應用計數器變爲0，則其將會被垃圾回收器回收。

　　最好將weak reference用於開銷較大的對象，或避免循環引用(雖然垃圾回收器常常幹這種事情)。

 
        import 
        weakref 
       
        import 
        gc 
       
        class 
        MyObject( 
        object 
        ): 
       
        def 
        my_method( 
        self 
        ): 
       
        print 
        'my_method was called!' 
       
        obj  
        = 
        MyObject() 
       
        r  
        = 
        weakref.ref(obj) 
       
        gc.collect() 
       
        assert 
        r()  
        is 
        obj  
        #r() allows you to access the object referenced: it's there. 
       
        obj  
        = 
        1 
        #Let's change what obj references to 
       
        gc.collect() 
       
        assert 
        r()  
        is 
        None 
        #There is no object left: it was gc'ed.

　　提示：只有library模塊中定義的class instances、functions、methods、sets、frozen sets、files、generators、type objects和certain object types(例如sockets、arrays和regular expression patterns)支持weakref。內建函數以及大部份內建類型如lists、dictionaries、strings和numbers則不支持。

　　6. Copy()

　　經過shallow或deep copy語法提供複製對象的函數操做。

　　shallow和deep copying的不一樣之處在於對於混合型對象的操做(混合對象是包含了其餘類型對象的對象，例如list或其餘類實例)。

對於shallow copy而言，它建立一個新的混合對象，而且將原對象中其餘對象的引用插入新對象。
對於deep copy而言，它建立一個新的對象，而且遞歸地複製源對象中的其餘對象並插入新的對象中。

　　普通的賦值操做知識簡單的將心變量指向源對象。

 
        import 
        copy 
       
        a  
        = 
        [ 
        1 
        , 
        2 
        , 
        3 
        ] 
       
        b  
        = 
        [ 
        4 
        , 
        5 
        ] 
       
        c  
        = 
        [a,b] 
       
        # Normal Assignment 
       
        d  
        = 
        c 
       
        print 
        id 
        (c)  
        = 
        = 
        id 
        (d)           
        # True - d is the same object as c 
       
        print 
        id 
        (c[ 
        0 
        ])  
        = 
        = 
        id 
        (d[ 
        0 
        ])     
        # True - d[0] is the same object as c[0] 
       
        # Shallow Copy 
       
        d  
        = 
        copy.copy(c) 
       
        print 
        id 
        (c)  
        = 
        = 
        id 
        (d)           
        # False - d is now a new object 
       
        print 
        id 
        (c[ 
        0 
        ])  
        = 
        = 
        id 
        (d[ 
        0 
        ])     
        # True - d[0] is the same object as c[0] 
       
        # Deep Copy 
       
        d  
        = 
        copy.deepcopy(c) 
       
        print 
        id 
        (c)  
        = 
        = 
        id 
        (d)           
        # False - d is now a new object 
       
        print 
        id 
        (c[ 
        0 
        ])  
        = 
        = 
        id 
        (d[ 
        0 
        ])     
        # False - d[0] is now a new object

　　shallow copy (copy())操做建立一個新的容器，其包含的引用指向原對象中的對象。

　　deep copy (deepcopy())建立的對象包含的引用指向複製出來的新對象。

　　複雜的例子：

　　假定我有兩個類，名爲Manager和Graph，每一個Graph包含了一個指向其manager的引用，而每一個Manager有一個指向其管理的Graph的集合，如今咱們有兩個任務須要完成：

　　1) 複製一個graph實例，使用deepcopy，但其manager指向爲原graph的manager。

　　2) 複製一個manager，徹底建立新manager，但拷貝原有的全部graph。

 
        import 
        weakref, copy 
       
        class 
        Graph( 
        object 
        ): 
       
        def 
        __init__( 
        self 
        , manager 
        = 
        None 
        ): 
       
        self 
        .manager  
        = 
        None 
        if 
        manager  
        is 
        None 
        else 
        weakref.ref(manager) 
       
        def 
        __deepcopy__( 
        self 
        , memodict): 
       
        manager  
        = 
        self 
        .manager() 
       
        return 
        Graph(memodict.get( 
        id 
        (manager), manager)) 
       
        class 
        Manager( 
        object 
        ): 
       
        def 
        __init__( 
        self 
        , graphs 
        = 
        []): 
       
        self 
        .graphs  
        = 
        graphs 
       
        for 
        g  
        in 
        self 
        .graphs: 
       
        g.manager  
        = 
        weakref.ref( 
        self 
        ) 
       
        a  
        = 
        Manager([Graph(), Graph()]) 
       
        b  
        = 
        copy.deepcopy(a) 
       
        if 
        [g.manager()  
        is 
        b  
        for 
        g  
        in 
        b.graphs]: 
       
        print 
        True 
        # True 
       
        if 
        copy.deepcopy(a.graphs[ 
        0 
        ]).manager()  
        is 
        a: 
       
        print 
        True 
        # True

　　7. Pprint()

　　Pprint模塊可以提供比較優雅的數據結構打印方式，若是你須要打印一個結構較爲複雜，層次較深的字典或是JSON對象時，使用Pprint可以提供較好的打印結果。

　　假定你須要打印一個矩陣，當使用普通的print時，你只能打印出普通的列表，不過若是使用pprint，你就能打出漂亮的矩陣結構

　　若是

 
        import 
        pprint 
       
        matrix  
        = 
        [ [ 
        1 
        , 
        2 
        , 
        3 
        ], [ 
        4 
        , 
        5 
        , 
        6 
        ], [ 
        7 
        , 
        8 
        , 
        9 
        ] ] 
       
        a  
        = 
        pprint.PrettyPrinter(width 
        = 
        20 
        ) 
       
        a.pprint(matrix) 
       
        # [[1, 2, 3], 
       
        #  [4, 5, 6], 
       
        #  [7, 8, 9]]

　　額外的知識

　　一些基本的數據結構

　　1. 單鏈鏈表

 
        class 
        Node: 
       
        def 
        __init__( 
        self 
        ): 
       
        self 
        .data  
        = 
        None 
       
        self 
        .nextNode  
        = 
        None 
       
        def 
        set_and_return_Next( 
        self 
        ): 
       
        self 
        .nextNode  
        = 
        Node() 
       
        return 
        self 
        .nextNode 
       
        def 
        getNext( 
        self 
        ): 
       
        return 
        self 
        .nextNode 
       
        def 
        getData( 
        self 
        ): 
       
        return 
        self 
        .data 
       
        def 
        setData( 
        self 
        , d): 
       
        self 
        .data  
        = 
        d 
       
        class 
        LinkedList: 
       
        def 
        buildList( 
        self 
        , array): 
       
        self 
        .head  
        = 
        Node() 
       
        self 
        .head.setData(array[ 
        0 
        ]) 
       
        self 
        .temp  
        = 
        self 
        .head 
       
        for 
        i  
        in 
        array[ 
        1 
        :]: 
       
        self 
        .temp  
        = 
        self 
        .temp.set_and_return_Next() 
       
        self 
        .temp.setData(i) 
       
        self 
        .tail  
        = 
        self 
        .temp 
       
        return 
        self 
        .head 
       
        def 
        printList( 
        self 
        ): 
       
        tempNode  
        = 
        self 
        .head 
       
        while 
        (tempNode! 
        = 
        self 
        .tail): 
       
        print 
        (tempNode.getData()) 
       
        tempNode  
        = 
        tempNode.getNext() 
       
        print 
        ( 
        self 
        .tail.getData()) 
       
        myArray  
        = 
        [ 
        3 
        ,  
        5 
        ,  
        4 
        ,  
        6 
        ,  
        2 
        ,  
        6 
        ,  
        7 
        ,  
        8 
        ,  
        9 
        ,  
        10 
        ,  
        21 
        ] 
       
        myList  
        = 
        LinkedList() 
       
        myList.buildList(myArray) 
       
        myList.printList()

　　2. 用Python實現的普林姆算法

　　譯者注：普林姆算法(Prims Algorithm)是圖論中，在加權連通圖中搜索最小生成樹的算法。

 
        from 
        collections  
        import 
        defaultdict 
       
        from 
        heapq  
        import 
        heapify, heappop, heappush 
       
        def 
        prim( nodes, edges ): 
       
        conn  
        = 
        defaultdict(  
        list 
        ) 
       
        for 
        n1,n2,c  
        in 
        edges: 
       
        conn[ n1 ].append( (c, n1, n2) ) 
       
        conn[ n2 ].append( (c, n2, n1) ) 
       
        mst  
        = 
        [] 
       
        used  
        = 
        set 
        ( nodes[  
        0 
        ] ) 
       
        usable_edges  
        = 
        conn[ nodes[ 
        0 
        ] ][:] 
       
        heapify( usable_edges ) 
       
        while 
        usable_edges: 
       
        cost, n1, n2  
        = 
        heappop( usable_edges ) 
       
        if 
        n2  
        not 
        in 
        used: 
       
        used.add( n2 ) 
       
        mst.append( ( n1, n2, cost ) ) 
       
        for 
        e  
        in 
        conn[ n2 ]: 
       
        if 
        e[  
        2 
        ]  
        not 
        in 
        used: 
       
        heappush( usable_edges, e ) 
       
        return 
        mst 
       
        #test 
       
        nodes  
        = 
        list 
        ( 
        "ABCDEFG" 
        ) 
       
        edges  
        = 
        [ ( 
        "A" 
        ,  
        "B" 
        ,  
        7 
        ), ( 
        "A" 
        ,  
        "D" 
        ,  
        5 
        ), 
       
        ( 
        "B" 
        ,  
        "C" 
        ,  
        8 
        ), ( 
        "B" 
        ,  
        "D" 
        ,  
        9 
        ), ( 
        "B" 
        ,  
        "E" 
        ,  
        7 
        ), 
       
        ( 
        "C" 
        ,  
        "E" 
        ,  
        5 
        ), 
       
        ( 
        "D" 
        ,  
        "E" 
        ,  
        15 
        ), ( 
        "D" 
        ,  
        "F" 
        ,  
        6 
        ), 
       
        ( 
        "E" 
        ,  
        "F" 
        ,  
        8 
        ), ( 
        "E" 
        ,  
        "G" 
        ,  
        9 
        ), 
       
        ( 
        "F" 
        ,  
        "G" 
        ,  
        11 
        )] 
       
        print 
        "prim:" 
        , prim( nodes, edges )