Python中Collections模塊的Counter容器類使用教程

時間 2019-12-09

標籤 python collections 模塊 counter 容器使用教程欄目 Python 简体版

原文原文鏈接

1.collections模塊html

collections模塊自Python 2.4版本開始被引入，包含了dict、set、list、tuple之外的一些特殊的容器類型，分別是：python

OrderedDict類：排序字典，是字典的子類。引入自2.7。
namedtuple()函數：命名元組，是一個工廠函數。引入自2.6。
Counter類：爲hashable對象計數，是字典的子類。引入自2.7。
deque：雙向隊列。引入自2.4。
defaultdict：使用工廠函數建立字典，使不用考慮缺失的字典鍵。引入自2.5。
文檔參見：http://docs.python.org/2/library/collections.html。app

2.Counter類ide

Counter類的目的是用來跟蹤值出現的次數。它是一個無序的容器類型，以字典的鍵值對形式存儲，其中元素做爲key，其計數做爲value。計數值能夠是任意的Interger（包括0和負數）。Counter類和其餘語言的bags或multisets很類似。函數

2.1 建立ui

下面的代碼說明了Counter類建立的四種方法：spa

Counter類的建立Python.net

 
         >>> c  
         = 
         Counter()  
         # 建立一個空的Counter類 
        
 
         >>> c  
         = 
         Counter( 
         'gallahad' 
         )  
         # 從一個可iterable對象（list、tuple、dict、字符串等）建立 
        
 
         >>> c  
         = 
         Counter({ 
         'a' 
         :  
         4 
         ,  
         'b' 
         :  
         2 
         })  
         # 從一個字典對象建立 
        
 
         >>> c  
         = 
         Counter(a 
         = 
         4 
         , b 
         = 
         2 
         )  
         # 從一組鍵值對建立 
        

            
        
 
         >>> c  
         = 
         Counter()  
         # 建立一個空的Counter類 
        
 
         >>> c  
         = 
         Counter( 
         'gallahad' 
         )  
         # 從一個可iterable對象（list、tuple、dict、字符串等）建立 
        
 
         >>> c  
         = 
         Counter({ 
         'a' 
         :  
         4 
         ,  
         'b' 
         :  
         2 
         })  
         # 從一個字典對象建立 
        
 
         >>> c  
         = 
         Counter(a 
         = 
         4 
         , b 
         = 
         2 
         )  
         # 從一組鍵值對建立 
        

2.2 計數值的訪問與缺失的鍵

當所訪問的鍵不存在時，返回0，而不是KeyError；不然返回它的計數。code

計數值的訪問Pythonorm

 
         >>> c  
         = 
         Counter( 
         "abcdefgab" 
         ) 
        
         >>> c[ 
         "a" 
         ] 
        
         2 
        
         >>> c[ 
         "c" 
         ] 
        
         1 
        
         >>> c[ 
         "h" 
         ] 
        
         0 
        
         >>> c  
         = 
         Counter( 
         "abcdefgab" 
         ) 
        
         >>> c[ 
         "a" 
         ] 
        
         2 
        
         >>> c[ 
         "c" 
         ] 
        
         1 
        
         >>> c[ 
         "h" 
         ] 
        
         0

2.3 計數器的更新（update和subtract）

可使用一個iterable對象或者另外一個Counter對象來更新鍵值。

計數器的更新包括增長和減小兩種。其中，增長使用update()方法：

計數器的更新（update）Python

 
        >>> c  
        = 
        Counter( 
        'which' 
        ) 
       
        >>> c.update( 
        'witch' 
        )  
        # 使用另外一個iterable對象更新 
       
        >>> c[ 
        'h' 
        ] 
       
        3 
       
        >>> d  
        = 
        Counter( 
        'watch' 
        ) 
       
        >>> c.update(d)  
        # 使用另外一個Counter對象更新 
       
        >>> c[ 
        'h' 
        ] 
       
        4 
       
        >>> c  
        = 
        Counter( 
        'which' 
        ) 
       
        >>> c.update( 
        'witch' 
        )  
        # 使用另外一個iterable對象更新 
       
        >>> c[ 
        'h' 
        ] 
       
        3 
       
        >>> d  
        = 
        Counter( 
        'watch' 
        ) 
       
        >>> c.update(d)  
        # 使用另外一個Counter對象更新 
       
        >>> c[ 
        'h' 
        ] 
       
        4

減小則使用subtract()方法：

計數器的更新（subtract）Python

 
         >>> c  
         = 
         Counter( 
         'which' 
         ) 
        
         >>> c.subtract( 
         'witch' 
         )  
         # 使用另外一個iterable對象更新 
        
         >>> c[ 
         'h' 
         ] 
        
         1 
        
         >>> d  
         = 
         Counter( 
         'watch' 
         ) 
        
         >>> c.subtract(d)  
         # 使用另外一個Counter對象更新 
        
         >>> c[ 
         'a' 
         ] 
        
         - 
         1 
        
         >>> c  
         = 
         Counter( 
         'which' 
         ) 
        
         >>> c.subtract( 
         'witch' 
         )  
         # 使用另外一個iterable對象更新 
        
         >>> c[ 
         'h' 
         ] 
        
         1 
        
         >>> d  
         = 
         Counter( 
         'watch' 
         ) 
        
         >>> c.subtract(d)  
         # 使用另外一個Counter對象更新 
        
         >>> c[ 
         'a' 
         ] 
        
         - 
         1

2.4 鍵的刪除

當計數值爲0時，並不意味着元素被刪除，刪除元素應當使用del。

鍵的刪除Python

2.5 elements()

返回一個迭代器。元素被重複了多少次，在該迭代器中就包含多少個該元素。全部元素按照字母序排序，個數小於1的元素不被包含。

 
        elements()方法Python 
       

           
       
 
        >>> c  
        = 
        Counter(a 
        = 
        4 
        , b 
        = 
        2 
        , c 
        = 
        0 
        , d 
        = 
        - 
        2 
        ) 
       
 
        >>>  
        list 
        (c.elements()) 
       
 
        [ 
        'a' 
        ,  
        'a' 
        ,  
        'a' 
        ,  
        'a' 
        ,  
        'b' 
        ,  
        'b' 
        ] 
       

           
       
 
        >>> c  
        = 
        Counter(a 
        = 
        4 
        , b 
        = 
        2 
        , c 
        = 
        0 
        , d 
        = 
        - 
        2 
        ) 
       
 
        >>>  
        list 
        (c.elements()) 
       
 
        [ 
        'a' 
        ,  
        'a' 
        ,  
        'a' 
        ,  
        'a' 
        ,  
        'b' 
        ,  
        'b' 
        ] 
       

2.6 most_common([n])

返回一個TopN列表。若是n沒有被指定，則返回全部元素。當多個元素計數值相同時，按照字母序排列。

most_common()方法Python

 
        >>> c  
        = 
        Counter( 
        'abracadabra' 
        ) 
       
 
        >>> c.most_common() 
       
 
        [( 
        'a' 
        ,  
        5 
        ), ( 
        'r' 
        ,  
        2 
        ), ( 
        'b' 
        ,  
        2 
        ), ( 
        'c' 
        ,  
        1 
        ), ( 
        'd' 
        ,  
        1 
        )] 
       
 
        >>> c.most_common( 
        3 
        ) 
       
 
        [( 
        'a' 
        ,  
        5 
        ), ( 
        'r' 
        ,  
        2 
        ), ( 
        'b' 
        ,  
        2 
        )] 
       

           
       
 
        >>> c  
        = 
        Counter( 
        'abracadabra' 
        ) 
       
 
        >>> c.most_common() 
       
 
        [( 
        'a' 
        ,  
        5 
        ), ( 
        'r' 
        ,  
        2 
        ), ( 
        'b' 
        ,  
        2 
        ), ( 
        'c' 
        ,  
        1 
        ), ( 
        'd' 
        ,  
        1 
        )] 
       
 
        >>> c.most_common( 
        3 
        ) 
       
 
        [( 
        'a' 
        ,  
        5 
        ), ( 
        'r' 
        ,  
        2 
        ), ( 
        'b' 
        ,  
        2 
        )] 
       

2.7 fromkeys

未實現的類方法。

2.8 淺拷貝copy

淺拷貝copyPython

2.9 算術和集合操做

+、-、&、|操做也能夠用於Counter。其中&和|操做分別返回兩個Counter對象各元素的最小值和最大值。須要注意的是，獲得的Counter對象將刪除小於1的元素。

Counter對象的算術和集合操做Python

 
        >>> c  
        = 
        Counter(a 
        = 
        3 
        , b 
        = 
        1 
        ) 
       
 
        >>> d  
        = 
        Counter(a 
        = 
        1 
        , b 
        = 
        2 
        ) 
       
 
        >>> c  
        + 
        d  
        # c[x] + d[x] 
       
 
        Counter({ 
        'a' 
        :  
        4 
        ,  
        'b' 
        :  
        3 
        }) 
       
 
        >>> c  
        - 
        d  
        # subtract（只保留正數計數的元素） 
       
 
        Counter({ 
        'a' 
        :  
        2 
        }) 
       
 
        >>> c & d  
        # 交集: min(c[x], d[x]) 
       
 
        Counter({ 
        'a' 
        :  
        1 
        ,  
        'b' 
        :  
        1 
        }) 
       
 
        >>> c | d  
        # 並集: max(c[x], d[x]) 
       
 
        Counter({ 
        'a' 
        :  
        3 
        ,  
        'b' 
        :  
        2 
        }) 
       

           
       
 
        >>> c  
        = 
        Counter(a 
        = 
        3 
        , b 
        = 
        1 
        ) 
       
 
        >>> d  
        = 
        Counter(a 
        = 
        1 
        , b 
        = 
        2 
        ) 
       
 
        >>> c  
        + 
        d  
        # c[x] + d[x] 
       
 
        Counter({ 
        'a' 
        :  
        4 
        ,  
        'b' 
        :  
        3 
        }) 
       
 
        >>> c  
        - 
        d  
        # subtract（只保留正數計數的元素） 
       
 
        Counter({ 
        'a' 
        :  
        2 
        }) 
       
 
        >>> c & d  
        # 交集: min(c[x], d[x]) 
       
 
        Counter({ 
        'a' 
        :  
        1 
        ,  
        'b' 
        :  
        1 
        }) 
       
 
        >>> c | d  
        # 並集: max(c[x], d[x]) 
       
 
        Counter({ 
        'a' 
        :  
        3 
        ,  
        'b' 
        :  
        2 
        }) 
       

3.經常使用操做

下面是一些Counter類的經常使用操做，來源於Python官方文檔

Counter類經常使用操做Python

 
        sum 
        (c.values())  
        # 全部計數的總數 
       
 
        c.clear()  
        # 重置Counter對象，注意不是刪除 
       
 
        list 
        (c)  
        # 將c中的鍵轉爲列表 
       
 
        set 
        (c)  
        # 將c中的鍵轉爲set 
       
 
        dict 
        (c)  
        # 將c中的鍵值對轉爲字典 
       
 
        c.items()  
        # 轉爲(elem, cnt)格式的列表 
       
 
        Counter( 
        dict 
        (list_of_pairs))  
        # 從(elem, cnt)格式的列表轉換爲Counter類對象 
       
 
        c.most_common()[: 
        - 
        n: 
        - 
        1 
        ]  
        # 取出計數最少的n個元素 
       
 
        c  
        + 
        = 
        Counter()  
        # 移除0和負值 
       

           
       
 
        sum 
        (c.values())  
        # 全部計數的總數 
       
 
        c.clear()  
        # 重置Counter對象，注意不是刪除 
       
 
        list 
        (c)  
        # 將c中的鍵轉爲列表 
       
 
        set 
        (c)  
        # 將c中的鍵轉爲set 
       
 
        dict 
        (c)  
        # 將c中的鍵值對轉爲字典 
       
 
        c.items()  
        # 轉爲(elem, cnt)格式的列表 
       
 
        Counter( 
        dict 
        (list_of_pairs))  
        # 從(elem, cnt)格式的列表轉換爲Counter類對象 
       
 
        c.most_common()[: 
        - 
        n: 
        - 
        1 
        ]  
        # 取出計數最少的n個元素 
       
 
        c  
        + 
        = 
        Counter()  
        # 移除0和負值 
       

4.實例
4.1判斷兩個字符串是否由相同的字母集合調換順序而成的（anagram）

 
        def 
        is_anagram(word1, word2): 
       
        """Checks whether the words are anagrams. 
       
        word1: string 
       
        word2: string 
       
        returns: boolean 
       
        """ 
       
        return 
        Counter(word1)  
        = 
        = 
        Counter(word2)

Counter若是傳入的參數是字符串，就會統計字符串中每一個字符出現的次數，若是兩個字符串由相同的字母集合顛倒順序而成，則它們Counter的結果應該是同樣的。

4.2多元集合(MultiSets)
multiset是相同元素能夠出現屢次的集合，Counter能夠很是天然地用來表示multiset。而且能夠將Counter擴展，使之擁有set的一些操做如is_subset。

 
        class 
        Multiset(Counter): 
       
        """A multiset is a set where elements can appear more than once.""" 
       
        def 
        is_subset( 
        self 
        , other): 
       
        """Checks whether self is a subset of other. 
       
        other: Multiset 
       
        returns: boolean 
       
        """ 
       
        for 
        char, count  
        in 
        self 
        .items(): 
       
        if 
        other[char] < count: 
       
        return 
        False 
       
        return 
        True 
       
        # map the <= operator to is_subset 
       
        __le__  
        = 
        is_subset

4.3機率質量函數
機率質量函數（probability mass function，簡寫爲pmf）是離散隨機變量在各特定取值上的機率。能夠利用Counter表示機率質量函數。

 
        class 
        Pmf(Counter): 
       
        """A Counter with probabilities.""" 
       
        def 
        normalize( 
        self 
        ): 
       
        """Normalizes the PMF so the probabilities add to 1.""" 
       
        total  
        = 
        float 
        ( 
        sum 
        ( 
        self 
        .values())) 
       
        for 
        key  
        in 
        self 
        : 
       
        self 
        [key]  
        / 
        = 
        total 
       
        def 
        __add__( 
        self 
        , other): 
       
        """Adds two distributions. 
       
        The result is the distribution of sums of values from the 
       
        two distributions. 
       
        other: Pmf 
       
        returns: new Pmf 
       
        """ 
       
        pmf  
        = 
        Pmf() 
       
        for 
        key1, prob1  
        in 
        self 
        .items(): 
       
        for 
        key2, prob2  
        in 
        other.items(): 
       
        pmf[key1  
        + 
        key2]  
        + 
        = 
        prob1  
        * 
        prob2 
       
        return 
        pmf 
       
        def 
        __hash__( 
        self 
        ): 
       
        """Returns an integer hash value.""" 
       
        return 
        id 
        ( 
        self 
        ) 
       
        def 
        __eq__( 
        self 
        , other): 
       
        return 
        self 
        is 
        other 
       
        def 
        render( 
        self 
        ): 
       
        """Returns values and their probabilities, suitable for plotting.""" 
       
        return 
        zip 
        ( 
        * 
        sorted 
        ( 
        self 
        .items()))

normalize: 歸一化隨機變量出現的機率，使它們之和爲1
add: 返回的是兩個隨機變量分佈兩兩組合之和的新的機率質量函數
render: 返回按值排序的(value, probability)的組合對，方便畫圖的時候使用
下面以骰子（ps: 這個居然念tou子。。。）做爲例子。

 
        d6  
        = 
        Pmf([ 
        1 
        , 
        2 
        , 
        3 
        , 
        4 
        , 
        5 
        , 
        6 
        ]) 
       
 
        d6.normalize() 
       
 
        d6.name  
        = 
        'one die' 
       
 
        print 
        (d6) 
       
 
        Pmf({ 
        1 
        :  
        0.16666666666666666 
        ,  
        2 
        :  
        0.16666666666666666 
        ,  
        3 
        :  
        0.16666666666666666 
        ,  
        4 
        :  
        0.16666666666666666 
        ,  
        5 
        :  
        0.16666666666666666 
        ,  
        6 
        :  
        0.16666666666666666 
        }) 
       

使用add，咱們能夠計算出兩個骰子和的分佈：

 
        d6_twice  
        = 
        d6  
        + 
        d6 
       
        d6_twice.name  
        = 
        'two dices' 
       
        for 
        key, prob  
        in 
        d6_twice.items(): 
       
        print 
        (key, prob)

藉助numpy.sum，咱們能夠直接計算三個骰子和的分佈：

 
        import 
        numpy as np 
       
 
        d6_thrice  
        = 
        np. 
        sum 
        ([d6] 
        * 
        3 
        ) 
       
 
        d6_thrice.name  
        = 
        'three dices' 
       

最後可使用render返回結果，利用matplotlib把結果畫圖表示出來：

 
        for 
        die  
        in 
        [d6, d6_twice, d6_thrice]: 
       
        xs, ys  
        = 
        die.render() 
       
        pyplot.plot(xs, ys, label 
        = 
        die.name, linewidth 
        = 
        3 
        , alpha 
        = 
        0.5 
        ) 
       
        pyplot.xlabel( 
        'Total' 
        ) 
       
        pyplot.ylabel( 
        'Probability' 
        ) 
       
        pyplot.legend() 
       
        pyplot.show()

結果以下：

4.4貝葉斯統計
咱們繼續用擲骰子的例子來講明用Counter如何實現貝葉斯統計。如今假設，一個盒子中有5種不一樣的骰子，分別是：4面、6面、8面、12面和20面的。假設咱們隨機從盒子中取出一個骰子，投出的骰子的點數爲6。那麼，取得那5個不一樣骰子的機率分別是多少？
（1）首先，咱們須要生成每一個骰子的機率質量函數：

 
        def 
        make_die(num_sides): 
       
 
           
        die  
        = 
        Pmf( 
        range 
        ( 
        1 
        , num_sides 
        + 
        1 
        )) 
       
 
           
        die.name  
        = 
        'd%d' 
        % 
        num_sides 
       
 
           
        die.normalize() 
       
 
           
        return 
        die 
       

           
       

           
       
 
        dice  
        = 
        [make_die(x)  
        for 
        x  
        in 
        [ 
        4 
        ,  
        6 
        ,  
        8 
        ,  
        12 
        ,  
        20 
        ]] 
       
 
        print 
        (dice) 
       

（2）接下來，定義一個抽象類Suite。Suite是一個機率質量函數表示了一組假設(hypotheses)及其機率分佈。Suite類包含一個bayesian_update函數，用來基於新的數據來更新假設(hypotheses)的機率。

 
        class 
        Suite(Pmf): 
       
        """Map from hypothesis to probability.""" 
       
        def 
        bayesian_update( 
        self 
        , data): 
       
        """Performs a Bayesian update. 
       
        Note: called bayesian_update to avoid overriding dict.update 
       
        data: result of a die roll 
       
        """ 
       
        for 
        hypo  
        in 
        self 
        : 
       
        like  
        = 
        self 
        .likelihood(data, hypo) 
       
        self 
        [hypo]  
        * 
        = 
        like 
       
        self 
        .normalize()

其中的likelihood函數由各個類繼承後，本身實現不一樣的計算方法。

（3）定義DiceSuite類，它繼承了類Suite。

 
        class 
        DiceSuite(Suite): 
       
        def 
        likelihood( 
        self 
        , data, hypo): 
       
        """Computes the likelihood of the data under the hypothesis. 
       
        data: result of a die roll 
       
        hypo: Die object 
       
        """ 
       
        return 
        hypo[data]

而且實現了likelihood函數，其中傳入的兩個參數爲： data: 觀察到的骰子擲出的點數，如本例中的6 hypo: 可能擲出的那個骰子

（4）將第一步建立的dice傳給DiceSuite，而後根據給定的值，就能夠得出相應的結果。

 
        dice_suite  
        = 
        DiceSuite(dice) 
       
        dice_suite.bayesian_update( 
        6 
        ) 
       
        for 
        die, prob  
        in 
        sorted 
        (dice_suite.items()): 
       
        print 
        die.name, prob 
       
        d4  
        0.0 
       
        d6  
        0.392156862745 
       
        d8  
        0.294117647059 
       
        d12  
        0.196078431373 
       
        d20  
        0.117647058824

正如，咱們所指望的4個面的骰子的機率爲0（由於4個面的點數只可能爲0~4），而6個面的和8個面的機率最大。如今，假設咱們又擲了一次骰子，此次出現的點數是8，從新計算機率：

 
        dice_suite.bayesian_update( 
        8 
        ) 
       
        for 
        die, prob  
        in 
        sorted 
        (dice_suite.items()): 
       
        print 
        die.name, prob 
       
        d4  
        0.0 
       
        d6  
        0.0 
       
        d8  
        0.623268698061 
       
        d12  
        0.277008310249 
       
        d20  
        0.0997229916898