Python Cookbook 數據結構和算法

時間 2019-11-12

原文原文鏈接

1.查找最大或最小的N個元素python

import heapq
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums)) # Prints [42, 37, 23]
print(heapq.nsmallest(3, nums)) # Prints [-4, 1, 2]


# 能夠接受關鍵字參數,用於更復雜的數據結構

portfolio = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

討論, 堆數據結構裏heap[0]永遠是最小的元素,剩餘最小的經過heapq.heappop()獲得,時間複雜度是O(log N).查找最小的三個能夠寫成算法

heapq.heappop(heap)
heapq.heappop(heap)
heapq.heappop(heap)

==>當查找的元素個數相對比較小的時候,nlargest()和nsmallest比較合適.數據結構

==>僅查找最大值或最小值, min()和max()函數會更快app

==>若是查找的數量跟集合自己差很少大,應該先排序,再使用切片操做sorted(items)[:N]和sorted(items)[-N:]函數

2.元祖是能夠比較大小的spa

a = (1, 2, 'dandy')
b = (10, 4, 'sam')
c = (1, 3, 'tom')
d = (1, 2, 'dandy1')

print(a < b)  # True
print(a < c)  # True
print(a < d)  # True

元祖會按照第一個元素,第二個元素的順序進行比較大小.code

那列表呢?對象

a = [1, 2]
b = [1, 3]
c = [2, 3]
print(a < b)  # True
print(a < c)  # True

元祖的混合數據比較呢?blog

class Foo:
    def __init__(self, a):
        self.a = a


a = (1, 2, [3, 4])
b = (1, 2, [4, 5])
c = (1, Foo(1))
print(a > b)   # False
print(a > c)

Traceback (most recent call last): File "/home/dandy/Documents/charm/cookbook/1算法和數據結構/13test.py", line 32, in <module>
    print(a > c)
TypeError: '>' not supported between instances of 'int' and 'Foo'

上面的擴展跳躍性有點強,直接從經常使用的數據結構擴展到了對象的比較.能夠發現報錯了,報錯內容爲Foo類沒有實現比較運算符.在一個類內,比較運算符的實現是依賴__lt__, __eq__, __gt__這三個內置函數的,分別對應'<', '==', '>'.在上面的比較內排序

1.解析a > c

2.比較a和c的第一個元素,a[0] > c[0], 結果是相等,跳到下一個元素

3.比較a和c的第二個元素,a[1] > c[1],此時c[1]是一個實例,以c[1]爲中心的話,能夠看作foo(1) < a[1],Foo沒有實現__lt__這個內置方法.

大結局:只要對象實現上述的三種比較方法__lt__, __eq__, __gt__就能夠進行比較大小了,python的對象確實也是這麼作的. 不少都是c實現的,__lt__, __eq__, __gt__至關於留給開發人員的外部接口,能夠重寫或者定義其內置方法.

class Foo:
    def __init__(self, a):
        self.a = a

    def __lt__(self, other):
        return self.a > other


a = (1, 2, [3, 4])
b = (1, 2, [4, 5])
c = (1, Foo(1))
print(a > b)  # False
print(a > c)  # False

3.字典的默認值

# pairs是一組新增數據,須要按照key,加入到字典d對應的字段的列表內
pairs = {'a': 1, 'b': 2, 'c': 3}

d = {}

for key, value in pairs:
    if key not in d:
        d[key] = []
    d[key].append(value)

能夠用字典的setdefault方法來解決:

pairs = {'a': 1, 'b': 2, 'c': 3}
d = {}

for key, value in pairs:
    d.setdefault(key, []).append(value)

這樣就會方便不少,但仍是有點彆扭,由於每次調用都要建立一個新的初始值的實例.引入內置的defaultdict,在字典對象申明的時候直接定義好value的對象

d = defaultdict(list)

for key, value in pairs:
    d[key].append(value)

4.字典比較大小

prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

比較大小,輸出鍵值

min_price = min(zip(prices.values(), prices.keys()))
# min_price is (10.75, 'FB')
max_price = max(zip(prices.values(), prices.keys()))
# max_price is (612.78, 'AAPL')

排序輸出

prices_sorted = sorted(zip(prices.values(), prices.keys()))
# prices_sorted is [(10.75, 'FB'), (37.2, 'HPQ'),
#                   (45.23, 'ACME'), (205.55, 'IBM'),
#                   (612.78, 'AAPL')]

討論一般的作法

min(prices.values()) # Returns 10.75
max(prices.values()) # Returns 612.78

min(prices, key=lambda k: prices[k]) # Returns 'FB'
max(prices, key=lambda k: prices[k]) # Returns 'AAPL'

# 上面的方式不能輸出完整的鍵值對

min_value = prices[min(prices, key=lambda k: prices[k])]
# 須要進行2次查找操做,時間複雜度高