python內置的數據結構包括:列表(list)、集合(set)、字典(dictionary),通常狀況下咱們能夠直接使用這些數據結構,但一般咱們還須要考慮好比搜索、排序、排列以及賽選等一些常見的問題。java
如何巧妙的使用數據結構和同數據有關的算法,在collections模塊中包含了針對各類數據結構的解決方法。python
In [5]: a = (4,5,6) In [6]: x,y,z = a In [7]: x Out[7]: 4 In [8]: z Out[8]: 6 In [9]: y Out[9]: 5 In [10]: b = ['python',222,(2018,9,30)] #嵌套分解變量 In [11]: p,n,(year,mon,day) = b In [12]: p Out[12]: 'python' In [13]: n Out[13]: 222 In [14]: year Out[14]: 2018 In [15]: day Out[15]: 30 #能夠分解的對象只要是可迭代對象如字符串、文件、迭代器和生成器 In [16]: s = 'py' In [17]: x,y = s In [18]: x Out[18]: 'p' #忽略某個值使用下劃線代替 In [19]: data = 'python' In [20]: x,_,_,y,_,_ = data In [21]: x Out[21]: 'p' In [22]: y Out[22]: 'h'
要從某個可迭代對象中分解出N個元素,可使用python的「*表達式」來表明多個mysql
列1:在做業成績中去掉最高和最低後取平均分nginx
In [47]: grades = (68,98,85,78,84,79,88) In [48]: def drop_first_last(grades): ...: first,*middle,last = grades ...: return sum(middle) / len(middle) ...: ...: In [49]: drop_first_last(sorted(list(grades),reverse=True)) Out[49]: 82.8
列2:在嵌套元組中*式語法的分解應用c++
records = [ ('foo',1,2,3), ('bar',11,22,33), ('foo',4,5,6), ('bar',44,55,66), ] def do_foo(x,y,z): print('foo',x,y,z) def do_bar(a,b,c): print('bar',a,b,c) for tag,*args in records: #分解元組打印 if tag == 'foo': do_foo(*args) elif tag == 'bar': do_bar(*args) #outing foo 1 2 3 bar 11 22 33 foo 4 5 6 bar 44 55 66
列3:經過split拆分分解元素算法
In [52]: passwd = 'root:x:0:0:root:/root:/bin/bash' In [53]: username,*_,homedir,sh = passwd.split(":") In [54]: username Out[54]: 'root' In [55]: homedir Out[55]: '/root' In [56]: sh Out[56]: '/bin/bash'
列1:使用collections.deque保存有限的歷史紀錄,deque用來建立一個固定長度的隊列sql
In [61]: from collections import deque #建立隊列長度對象 In [62]: q = deque(maxlen=3) #加入數據到隊列 In [63]: q.append(1) In [64]: q.append(2) In [65]: q.append(3) In [66]: q Out[66]: deque([1, 2, 3]) In [67]: q.append(4) In [68]: q Out[68]: deque([2, 3, 4]) #從左邊加入數據到隊列 In [69]: q.appendleft(5) In [70]: q Out[70]: deque([5, 2, 3]) #從末尾取出一個數據 In [71]: q.pop() Out[71]: 3 In [72]: q Out[72]: deque([5, 2]) In [73]: q.popleft() Out[73]: 5 In [74]: q Out[74]: deque([2])
在heapq模塊中有兩個函數nlargest()從最大的值開始取,nsmallest()從最小的值開始取json
In [75]: import heapq In [76]: numbers = [1,3,4,9,11,34,55,232,445,9812,321,45,67,434,555] #取三個最大的值 In [77]: heapq.nlargest(3,numbers) Out[77]: [9812, 555, 445] #取三個最小的值 In [78]: heapq.nsmallest(3,numbers) Out[78]: [1, 3, 4]
hepaq模塊實現了python中的推排序,並提供了不少方法,讓用python實現排序算法有了簡單快捷的方式c#
In [1]: import heapq In [2]: date = [19,1,9,3,11,21] In [3]: heap = [] #heappush方法會插入一個元素到堆中,並按從小到大排序 In [4]: for i in date: ...: heapq.heappush(heap,i) ...: In [5]: heap Out[5]: [1, 3, 9, 19, 11, 21] In [6]: date Out[6]: [19, 1, 9, 3, 11, 21] #heapify方法會從新排序整個列表 In [7]: heapq.heapify(date) In [8]: date Out[8]: [1, 3, 9, 19, 11, 21] #heappop()方法會取出第一個元素,並將剩下的元素堆排序 In [10]: date Out[10]: [19, 1, 9, 3, 11, 21] In [11]: heapq.heappop(date) Out[11]: 19 In [12]: date Out[12]: [1, 3, 9, 21, 11] #heapreplace()的做用是在堆中取第一個元素並插入一個元素 In [27]: date = [11,8,3,78,35] In [28]: heapq.heapreplace(date,1) Out[28]: 11 In [29]: date Out[29]: [1, 8, 3, 78, 35] #在集合中找出最大或者最小的N個元素,可使用nlargest()和nsmallest() In [30]: date = [3,88,32,97,56] In [31]: heapq.nlargest(2,date) Out[31]: [97, 88] In [33]: heapq.nsmallest(2,date) Out[33]: [3, 32] #nlargest()和nsmallest()還能夠接受一個key參數來實現複雜的數據結構上的取值,如根據字典的值取值 In [34]: port = [ ...: {'name':'dhcp','port':67}, ...: {'name':'mysql','port':3306}, ...: {'name':'memcached','port':11211}, ...: {'name':'nginx','port':80}, ...: {'name':'ssh','port':22},] In [35]: heapq.nlargest(3,port,key=lambda x:x['port']) Out[35]: [{'name': 'memcached', 'port': 11211}, {'name': 'mysql', 'port': 3306}, {'name': 'nginx', 'port': 80}] In [36]: heapq.nsmallest(3,port,key=lambda x:x['port']) Out[36]: [{'name': 'ssh', 'port': 22}, {'name': 'dhcp', 'port': 67}, {'name': 'nginx', 'port': 80}]
實現優先級隊列實例:api
import heapq class priorityqueue(object): def __init__(self): self._queue = [] self._index = 0 def push(self,item,priority): heapq.heappush(self._queue,(-priority,self._index,item)) self._index += 1 def pop(self): return heapq.heappop(self._queue)[-1]def listt(self): return self._queue q = priorityqueue() q.push('python',44) q.push('java',2) q.push('c++',4) q.push('c#',8) q.push('goo',88) q.push('perl',1) date1 = q.listt() print(date1) print(q.pop()) print(q.listt()) #output [(-88, 4, 'goo'), (-44, 0, 'python'), (-4, 2, 'c++'), (-2, 1, 'java'), (-8, 3, 'c#'), (-1, 5, 'perl')] goo [(-44, 0, 'python'), (-8, 3, 'c#'), (-4, 2, 'c++'), (-2, 1, 'java'), (-1, 5, 'perl')]
可使用列表、元組、集合來建立多個值的字典鍵值
dictlist = { 'a':[1,2], 'b':[3,4], 'c':[5,6], } dictset = { 'as':{7,8}, 'bs':{9,0}, }
在collection模塊中的defaultdict類,它能夠自動初始化第一個值,只須要添加元素便可
In [1]: from collections import defaultdict In [2]: d = defaultdict(list) In [3]: d Out[3]: defaultdict(list, {}) In [4]: d['a'].append(1) In [5]: d['a'].append(2) In [6]: d Out[6]: defaultdict(list, {'a': [1, 2]})
要控制字典中元素的順序,可使用collections模塊中的OrderedDict類,當對字典作迭代時,它會嚴格按照元素初始添加的順序進行迭代
from collections import OrderedDict d = OrderedDict() d['one'] = 1 d['two'] = 2 d['three'] = 3 d['four'] = 4 for key,value in d.items(): print(key,value) #output one 1 two 2 three 3 four 4
當咱們先精確控制字典中各字段的順序而後序列化時,只須要在序列化前使用OrderdDist來構建字典數據,OrderedDict內部維護了一個雙向鏈表,它會根據元素加入的順序來排列鍵的位置,第一個新加入的元素被放置在鏈表的末尾,之後對已存在的鍵作修改也不會改變鍵的順序,因爲它額外建立了鏈表所佔用的空間會是普通字典的2倍
from collections import OrderedDict import json d = OrderedDict() d['one'] = 1 d['two'] = 2 d['three'] = 3 d['four'] = 4 jsd = json.dumps(d) d1 = json.loads(jsd) print(jsd) print(d1) # {"one": 1, "two": 2, "three": 3, "four": 4} {'one': 1, 'two': 2, 'three': 3, 'four': 4}
prices = { 'ACME':45.23, 'AAPL':612.78, 'IBM':205.55, 'HPQ':10.75, 'FB':10.75 } print(min(zip(prices.values(),prices.keys()))) print(max(zip(prices.values(),prices.keys()))) print(sorted(zip(prices.values(),prices.keys()))) #使用zip()將字典中的值映射爲元組的迭代器,但zip()只能被使用一次 #若是對比的值相同,則選擇鍵的排序大小 # (10.75, 'FB') (612.78, 'AAPL') [(10.75, 'FB'), (10.75, 'HPQ'), (45.23, 'ACME'), (205.55, 'IBM'), (612.78, 'AAPL')]
a = {'x':1,'y':2,'z':3} b = {'w':10,'x':11,'y':2} print(a.keys() & b.keys()) #a和b中同時都有的key print(a.keys() - b.keys()) #a中的鍵不在b中出現的key print(a.items() & b.items()) #a和b中鍵值都相同的元素 # {'x', 'y'} {'z'} {('y', 2)}
In [1]: a = {'a':11,'b':22,'c':44,'d':99,'f':101} #推倒式排除鍵新建字典 In [2]: c = {key:a[key] for key in a.keys() - {'b','d'}} In [3]: c Out[3]: {'f': 101, 'a': 11, 'c': 44}
dic = [{'x':1,'y':3},{'x':3,'y':8},{'x':1,'y':11},{'x':1,'y':3}] def dedupe(items,key=None): seen = set() for item in items: val = item if key is None else key(item) if val not in seen: yield item seen.add(val) #key傳遞函數將序列中的元素轉換爲可哈希值,來去除重複項 date = list(dedupe(dic,key=lambda d:(d['x'],d['y']))) print(date) date1 = list(dedupe(dic,key=lambda x:(x['x']))) print(date1) # [{'x': 1, 'y': 3}, {'x': 3, 'y': 8}, {'x': 1, 'y': 11}] [{'x': 1, 'y': 3}, {'x': 3, 'y': 8}]
使用內置函數slice來建立切片對象
In [4]: li = [1,2,3,4,5,6,7,8,9,0] In [5]: cost = li[slice(2,8)] In [6]: cost Out[6]: [3, 4, 5, 6, 7, 8] In [8]: li[slice(2,9,2)] Out[8]: [3, 5, 7, 9]
collections模塊中的Counter類能夠直接統計每一個元素出現的次數,它會以字典的形式映射每一個元素出現的次數,其中的most_common()方法能夠直接顯示結果,可傳參數爲顯示的元素個數
In [15]: date = [1,23,4,3,2,5,23,123,553,23,1,3,4,5,2,3,423,12,3,4,23,412,43] In [16]: from collections import Counter In [17]: Counter(date) Out[17]: Counter({1: 2, 23: 4, 4: 3, 3: 4, 2: 2, 5: 2, 123: 1, 553: 1, 423: 1, 12: 1, 412: 1, 43: 1}) In [18]: Counter(date).most_common() Out[18]: [(23, 4), (3, 4), (4, 3), (1, 2), (2, 2), (5, 2), (123, 1), (553, 1), (423, 1), (12, 1), (412, 1), (43, 1)] In [19]: Counter(date).most_common(2) Out[19]: [(23, 4), (3, 4)] In [20]: Counter(date).most_common(4) Out[20]: [(23, 4), (3, 4), (4, 3), (1, 2)]
In [27]: date1 = ['a','b','c','a','a','b'] In [28]: from collections import Counter #生成一個Counter對象,爲字典映射的統計值 In [29]: counts = Counter(date1) In [30]: counts['a'] Out[30]: 3 #建立第二個序列 In [31]: date2 = ['b','b','a'] #先統計元素出現的次數 In [32]: counts.most_common() Out[32]: [('a', 3), ('b', 2), ('c', 1)] #使用update()方法來手動更新counts對象 In [33]: counts.update(date2) #查看結果 In [34]: counts.most_common() Out[34]: [('a', 4), ('b', 4), ('c', 1)] #建立第二個counter對象 In [35]: counts1 = Counter(date2) #counter對象能夠用加減來運算 In [36]: counts - counts1 Out[36]: Counter({'a': 3, 'b': 2, 'c': 1})
operator模塊中的itemgetter函數能夠對嵌套數據結構的排序會很是簡單且運行很快
from operator import itemgetter date1 = [ {'fname':'Brian','lname':'Jones','uid':1003}, {'fname':'David','lname':'Beazley','uid':1002}, {'fname':'John','lname':'Cleese','uid':1001}, {'fname':'Big','lname':'Jones','uid':1004}, ] print(sorted(date1,key=itemgetter('uid'))) print(sorted(date1,key=itemgetter('uid'),reverse=True)) #反向排序 print(sorted(date1,key=itemgetter('uid','fname'))) #經過多個公共鍵排序 print(sorted(date1,key=lambda x:x['uid'])) #也可使用匿名函數來代替,但速度沒有itemgetter()函數快 print(min(date1,key=itemgetter('uid'))) #itemgetter()也能夠用在去最大或最小值上 print(max(date1,key=itemgetter('uid'))) # [{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}] [{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}] [{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}] [{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}] {'fname': 'John', 'lname': 'Cleese', 'uid': 1001} {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
對原生不支持比較操做的對象排序
from operator import attrgetter class user(object): def __init__(self,user_id): self.user_id = user_id def __repr__(self): return 'user({})'.format(self.user_id) users = [user(11),user(22),user(3)] print(users) print(sorted(users,key=lambda x:x.user_id)) print(sorted(users,key=attrgetter('user_id'))) #使用attrgetter()函數來對實例化對象的參數值排序
itertools模塊中的函數groupby()能夠經過掃描序列找出擁有相同值或是參數key指定的函數所返回的值的序列項,並將它們分組,groupby()建立一個迭代器,而每次迭代時都回返回一個值,和一個子迭代器,這個子迭代器能夠產生全部在該分組內具備該值的項。
from operator import itemgetter from itertools import groupby rows = [ {'a':'python','date':'07/01/2012'}, {'a':'java','date':'08/11/2015'}, {'a':'c++','date':'09/12/2018'}, {'a':'perl','date':'17/06/2017'}, ] rows.sort(key=itemgetter('date')) print(rows) for date,items in groupby(rows,key=itemgetter('date')): print(date) for i in items: print(' ',i) # [{'a': 'python', 'date': '07/01/2012'}, {'a': 'java', 'date': '08/11/2015'}, {'a': 'c++', 'date': '09/12/2018'}, {'a': 'perl', 'date': '17/06/2017'}] 07/01/2012 {'a': 'python', 'date': '07/01/2012'} 08/11/2015 {'a': 'java', 'date': '08/11/2015'} 09/12/2018 {'a': 'c++', 'date': '09/12/2018'} 17/06/2017 {'a': 'perl', 'date': '17/06/2017'}
from collections import defaultdict #根據數據分組來構建一個一鍵多值的字典 rows_date = defaultdict(list) for row in rows: rows_date[row['date']].append(row) print(rows_date) # defaultdict(<class 'list'>, {'07/01/2012': [{'a': 'python', 'date': '07/01/2012'}], '08/11/2015': [{'a': 'java', 'date': '08/11/2015'}], '09/12/2018': [{'a': 'c++', 'date': '09/12/2018'}], '17/06/2017': [{'a': 'perl', 'date': '17/06/2017'}]})
#使用列表推到式來賽選列表中符合要求的值 In [37]: mylist = [1,2,-5,10,-8,3,-1] In [38]: list(i for i in mylist if i > 0) Out[38]: [1, 2, 10, 3] In [39]: list(i for i in mylist if i < 0) Out[39]: [-5, -8, -1] #若是輸入的值很是多,能夠先生成生成器而後篩選結果值 In [43]: pos = (i for i in mylist if i > 0) In [46]: for i in pos: ...: print(i) ...: 1 2 10 3
若是碰到篩選不標準的值如包含字符和數字,只篩選出數字呢?
In [47]: values = [1,'3','-4','-',88,'N/A','python','5'] In [48]: def is_int(val): ...: try: ...: x = int(val) ...: return True ...: except ValueError: ...: return False ...: #在篩選不規則的值式使用函數來過濾異常而後使用filter函數處理 In [49]: list(filter(is_int,values)) Out[49]: [1, '3', '-4', 88, '5'] #用新值替換掉篩選不和規定的值 In [50]: mylist = [1,4,-5,10,-7,2,3,-1] In [51]: list(i if i > 0 else 0 for i in mylist) Out[51]: [1, 4, 0, 10, 0, 2, 3, 0] In [52]: list(i if i < 0 else 0 for i in mylist) Out[52]: [0, 0, -5, 0, -7, 0, 0, -1] #還可使用itertools.compress()來構建一個布爾選擇器序列來賽選數據 In [53]: addresses = ['one','two','three','four','five'] In [54]: from itertools import compress In [55]: counts = [1,3,5,6,3] In [56]: more1 = [i > 3 for i in counts] In [57]: more1 Out[57]: [False, False, True, True, False] In [58]: list(compress(addresses,more1)) Out[58]: ['three', 'four']
prices = { 'ACME':45.23, 'AAPL':612.78, 'IBM':205.55, 'HPQ':37.20, 'FB':10.75, } #推到式建立值大於30的字典集合 P1 = {key:value for key,value in prices.items() if value > 30} print(P1) #推倒式建立在tech中有的鍵的字典集合 tech = {'ACME','IBM','HPQ','FB'} P2 = {key:value for key,value in prices.items() if key in tech} print(P2) # {'ACME': 45.23, 'AAPL': 612.78, 'IBM': 205.55, 'HPQ': 37.2} {'ACME': 45.23, 'IBM': 205.55, 'HPQ': 37.2, 'FB': 10.75}
#使用dict()函數來建立會更加清晰,效率會是上面的兩倍 P3 = dict((key,value) for key,value in prices.items() if value > 100) print(P3) # {'AAPL': 612.78, 'IBM': 205.55}
#多種實現方式,但這種方法會慢不少 p4 ={key:prices[key] for key in prices.keys() & tech} print(p4) # {'ACME': 45.23, 'IBM': 205.55, 'FB': 10.75, 'HPQ': 37.2}
collections.namedtuple()模塊定義命名元組
In [1]: from collections import namedtuple In [2]: subject = namedtuple('subject',['one','two','three']) In [3]: sub = subject(1,2,3) In [4]: sub Out[4]: subject(one=1, two=2, three=3) In [7]: sub.index(2) Out[7]: 1 In [9]: sub.one Out[9]: 1 In [10]: sub.two Out[10]: 2 In [11]: sub.three Out[11]: 3 In [12]: len(sub) Out[12]: 3 In [13]: a,b,c=sub In [14]: a Out[14]: 1 In [15]: c Out[15]: 3
#若是要修改某個值可使用_replace()方法 In [17]: sub._replace(one=88) Out[17]: subject(one=88, two=2, three=3)
#使用生成器表達式將數據轉換和換算 In [19]: numbers = [1,2,3,4,5] In [20]: s = sum(x * x for x in numbers) In [21]: s Out[21]: 55 In [26]: portfolio = [{'name':'GOOG','shares':50},{'name':'YHOO','shares':75},{'name':'AOL','shares':20},{'name':'SCOX','shares':65}] #對全部商品求和 In [27]: sumnmber = sum(i['shares'] for i in portfolio) In [28]: sumnmber Out[28]: 210 In [29]: minmber = min(i['shares'] for i in portfolio) In [30]: minmber Out[30]: 20 In [31]: maxmber = max(i['shares'] for i in portfolio) In [32]: maxmber Out[32]: 75 #也可使用key參數來換算 In [33]: min(portfolio,key=lambda s:s['shares']) Out[33]: {'name': 'AOL', 'shares': 20}
In [34]: a = {'x':11,'z':33} In [35]: b = {'y':22,'z':44} #利用collections模塊中的ChainMap類來實現多個字典的合併檢查 In [36]: from collections import ChainMap In [37]: c = ChainMap(a,b) In [38]: c Out[38]: ChainMap({'x': 11, 'z': 33}, {'y': 22, 'z': 44}) In [39]: c['z'] = 33 In [40]: c Out[40]: ChainMap({'x': 11, 'z': 33}, {'y': 22, 'z': 44}) In [41]: c['z'] = 55 In [42]: c Out[42]: ChainMap({'x': 11, 'z': 55}, {'y': 22, 'z': 44}) In [43]: values = ChainMap() In [44]: values['x'] = 100 In [45]: values = values.new_child() In [46]: values['x'] = 200 In [47]: values Out[47]: ChainMap({'x': 200}, {'x': 100}) In [48]: values = values.new_child() In [50]: values['x'] = 50 In [51]: values Out[51]: ChainMap({'x': 50}, {'x': 200}, {'x': 100}) In [52]: values['x'] Out[52]: 50 #利用字典的update()方法將多個字典合併一塊兒,它會從新構建一個完整的字典 In [58]: a = {'x':1,'z':3} In [59]: b = {'y':2,'z':4} In [60]: merged = dict(b) In [61]: merged.update(a) In [62]: merged Out[62]: {'y': 2, 'z': 3, 'x': 1} #ChainMap使用的是原始的字典,對原始數據的更改會映射到新建的對象上 In [63]: a = {'x':1,'z':3} In [64]: b = {'y':2,'z':4} In [65]: merged = ChainMap(a,b) In [66]: merged Out[66]: ChainMap({'x': 1, 'z': 3}, {'y': 2, 'z': 4}) In [67]: merged['x'] Out[67]: 1 In [68]: a['x'] = 100 In [69]: merged['x'] Out[69]: 100 In [70]: merged Out[70]: ChainMap({'x': 100, 'z': 3}, {'y': 2, 'z': 4})