Python數據結構——collections

時間 2019-11-21
標籤 python 數據結構 collections 欄目 Python 简体版
原文原文鏈接
Python包括不少標準編程數據結構，如list,tuple,dict,set，這些屬於內置類型
collections模塊包含多種數據結構的實現，擴展了其餘模塊中相應的結構。
Deque是一個雙端隊列，容許從任意一端增長或刪除元素。
defaultdict是一個字典，若是找不到某個鍵，會相應一個默認值。
OrderedDict會記住增長元素的序列。
nametuple擴展了通常的tuple,除了爲每一個成員元素提供一個數值索引外還提供了一個屬性名。

2.1 collections-容器數據類型
collections模塊包含除了內置類型list,dict和tuple之外的其餘容器數據類型。

2.1.1 Counter
Counter做爲一個容器，能夠跟蹤相同的值增長了多少次。這個類能夠用來實現其餘語言經常使用包或多集合數據結構來實現的算法。

初始化
Counter支持3種形式的初始化。調用Counter的構造函數時能夠提供一個元素序列或者一個包含鍵和計數的字典，還可使用關鍵字參數將字符串名映射到計數。
import collections
print collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
print collections.Counter({'a':2, 'b':3, 'c':1})
print collections.Counter(a=2, b=3, c=1)
這三種形式的初始化結構都是同樣的。
>>> ================================ RESTART ================================
>>> 
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
若是不提供任何參數，能夠構造一個空的Counter，而後經過update()方法填充。
import collections
c = collections.Counter()
print 'Initial  :', c
c.update('abcdcaa')
print 'Sequencel:', c
c.update({'a':1, 'd':6})
print 'Dict     :', c
計數值將根據新數據增長，替換數據不會改變計數。
>>> ================================ RESTART ================================
>>> 
Initial  : Counter()
Sequencel: Counter({'a': 3, 'c': 2, 'b': 1, 'd': 1})
Dict     : Counter({'d': 7, 'a': 4, 'c': 2, 'b': 1})

訪問計數
一旦填充了Counter，可使用字典API獲取它的值。
import collections
c = collections.Counter('abcdccca')
for letter in 'abcde':
    print '%s : %d' % (letter, c[letter])
對於未知元素，Counter不會產生KerError，若是沒有找到某個值，其計數爲0。
>>> ================================ RESTART ================================
>>> 
a : 2
b : 1
c : 4
d : 1
elements()方法返回一個迭代器，將生產Counter知道的全部元素
import collections
c = collections.Counter('abcdccca')
c['e'] = 0
print c
print list(c.elements())
不能保證元素順序不變，另外計數小於或等於0的元素不包含在內。
>>> ================================ RESTART ================================
>>> 
Counter({'c': 4, 'a': 2, 'b': 1, 'd': 1, 'e': 0})
['a', 'a', 'c', 'c', 'c', 'c', 'b', 'd']
使用most_common()能夠生成一個序列，其中包含n個最常遇到的輸入值及其相應計數。
import collections
c = collections.Counter()
with open(r'd:\check_traffic.sh', 'rt') as f:
          for line in f:
              c.update(line.rstrip().lower())
print 'Most common:'
for letter, count in c.most_common(5):
          print '%s: %6d' % (letter, count)
統計系統全部單詞中出現的字母，生成一個頻率分佈，而後打印5個最多見的字母。
>>> ================================ RESTART ================================
>>> 
Most common:
 :   6535
e:   3435
    :   3202
t:   3141
i:   3100

算術操做
Counter實例支持算術和集合操做來完成結果的彙集。
import collections
c1 = collections.Counter(['a', 'a', 'c', 'b' ,'a'])
c2 = collections.Counter('alphabet')
print 'c1:', c1
print 'c2:', c2
print '\nCombined counts:'
print c1 + c2
print '\nSubtraction:'
print c1 - c2
print '\nIntersection:'
print c1 & c2
print '\nUnion:'
print c1 | c2
每次經過操做生成一個新的Counter時，計數爲0或者負的元素都會被刪除。

>>> ================================ RESTART ================================
>>> 
c1: Counter({'a': 3, 'c': 1, 'b': 1})
c2: Counter({'a': 2, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

Combined counts:
Counter({'a': 5, 'b': 2, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

Subtraction:
Counter({'a': 1, 'c': 1})

Intersection:
Counter({'a': 2, 'b': 1})

Union:
Counter({'a': 3, 'c': 1, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

2.1.2 defaultdict
標準字典包括一個方法setdefault()來獲取一個值，若是值不存在則創建一個默認值。defaultdict初始化容器是會讓調用者提早指定默認值。
import collections
def default_factory():
    return 'default value'
d = collections.defaultdict(default_factory, foo = 'bar')
print 'd:', d
print 'foo =>', d['foo']
print 'var =>', d['bar']

只要全部鍵都有相同的默認值，就可使用這個方法。
>>> ================================ RESTART ================================
>>> 
d: defaultdict(<function default_factory at 0x0201FAB0>, {'foo': 'bar'})
foo => bar
var => default value

2.1.3 deque
deque(兩端隊列)支持從任意一端增長和刪除元素。經常使用的兩種結果，即棧和隊列，就是兩端隊列的退化形式，其輸入和輸出限制在一端。
import collections
d = collections.deque('abcdefg')
print 'Deque:', d
print 'Length:', len(d)
print 'Deft end', d[0]
print 'Right end', d[-1]
d.remove('c')
print 'remove(c):', d
deque是一種序列容器，支持list操做，能夠經過匹配標識從序列中間刪除元素。
>>> ================================ RESTART ================================
>>> 
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Length: 7
Deft end a
Right end g
remove(c): deque(['a', 'b', 'd', 'e', 'f', 'g'])

填充
deque能夠從任意一端填充，在python實現稱爲「左端」和「右端」。
import collections
d1 = collections.deque()
d1.extend('abcdefg')
print 'extend:', d1
d1.append('h')
print 'append:', d1
d2 = collections.deque()
d2.extendleft(xrange(6))
print 'extendleft', d2
d2.appendleft(6)
print 'appendleft', d2
extendleft()迭代處理其輸入，對每一個元素完成與appendleft()相同的處理。
>>> ================================ RESTART ================================
>>> 
extend: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
append: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
extendleft deque([5, 4, 3, 2, 1, 0])
appendleft deque([6, 5, 4, 3, 2, 1, 0])

利用
能夠從兩端利用deque元素，取決於應用的算法。
import collections
print "From the right:"
d = collections.deque('abcdefg')
while True:
    try:
        print d.pop(),
    except IndexError:
        break
print
print "\nFrom the left:"
d = collections.deque(xrange(6))
while True:
    try:
        print d.popleft(),
    except IndexError:
        break
print
使用pop()能夠從deque右端刪除一個元素，使用popleft()能夠從deque左端刪除一個元素。
>>> ================================ RESTART ================================
>>> 
From the right:
g f e d c b a

From the left:
0 1 2 3 4 5
因爲雙端隊列是線程安全的，能夠在不一樣的線程中同時從兩端利用隊列的內容。
import collections
import threading
import time
candle = collections.deque(xrange(5))
def burn(direction, nextSource):
    while True:
        try:
            next = nextSource()
        except IndexError:
            break
        else:
            print '%8s: %s' % (direction, next)
            time.sleep(0.1)
    print '%8s done' % direction
    return
left = threading.Thread(target=burn, args=('Left', candle.popleft))
right = threading.Thread(target=burn, args=('Right', candle.pop))
left.start()
right.start()
left.join()
right.join()
線程交替處理兩端，刪除元素，知道這個deque爲空。
>>> ================================ RESTART ================================
>>> 
    Left: 0   Right: 4

   Right: 3    Left: 1

   Right: 2    Left done

   Right done

旋轉
deque另一個做用能夠按照任意一個方向旋轉，而跳過一些元素。
import collections
d = collections.deque(xrange(10))
print 'Normal:', d
d= collections.deque(xrange(10))
d.rotate(2)
print 'Right roration:', d
d = collections.deque(xrange(10))
d.rotate(-2)
print 'Left roration:', d

>>> ================================ RESTART ================================
>>> 
Normal: deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right roration: deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
Left roration: deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])

2.1.4 namedtuple
標準tuple使用數值索引來訪問其成員。
nametuple實例與常規元祖在內存使用方面一樣高效，由於它們沒有各實例的字典。各類nametuple都是由本身的類表示，使用nametuple()工廠函數來建立。參數就是一個新類名和一個包含元素名的字符串。
import collections
Person = collections.namedtuple('Persion', 'name age gender')
print 'Type of Person:', type(Person)
bob = Person(name='Bob', age=30, gender='male')
print '\nRepresentation:', bob
jane = Person(name='Jane', age=28, gender='female')
print '\nField by name:', jane.name
print '\nField by index:'
for p in [bob, jane]:
    print '%s is a %d year old %s' %p

2.1.5 OrderedDict
OrderedDict是一個字典子類，能夠記住其內容增長的順序。
import collections
print 'Regular dictionary:'
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
for k, v in d.items():
    print k, v
print '\nOrderDict:'
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
for k, v in d.items():
    print k, v
常規dict並不跟蹤插入順序，迭代處理會根據鍵在散列表中存儲的順序來生成值。在OrderDict中則相反，它會記住元素插入的順序，並在建立迭代器時使用這個順序。
>>> ================================ RESTART ================================
>>> 
Regular dictionary:
a A
c C
b B
OrderDict:
a A
b B
c C
常規dict在檢查相等性是會查看其內容，OrderDict中還會考慮元素增長的順序。