Python使用heapq實現小頂堆（TopK大）、大頂堆（BtmK小）

Python使用heapq實現小頂堆（TopK大）、大頂堆（BtmK小） | 四號程序員python

Python使用heapq實現小頂堆（TopK大）、大頂堆（BtmK小）

4 Replies

需1求：給出N長的序列，求出TopK大的元素，使用小頂堆，heapq模塊實現。程序員

view source

print ?

01 import heapq

02 import random

03

04 class TopkHeap(object):

05     def __init__(self, k):

06         self.k = k

07         self.data = []

08

09     def Push(self, elem):

10         if len(self.data) < self.k:

11             heapq.heappush(self.data, elem)

12         else:

13             topk_small = self.data[0]

14             if elem > topk_small:

15                 heapq.heapreplace(self.data, elem)

16

17     def TopK(self):

18         return [x for x in reversed([heapq.heappop(self.data) for x in xrange(len(self.data))])]

19

20 if __name__ == "__main__":

21     print "Hello"

22     list_rand = random.sample(xrange(1000000), 100)

23     th = TopkHeap(3)

24     for i in list_rand:

25         th.Push(i)

26     print th.TopK()

27     print sorted(list_rand, reverse=True)[0:3]

上面的用heapq就能輕鬆搞定。app

變態的需求來了：給出N長的序列，求出BtmK小的元素，即便用大頂堆。dom

heapq在實現的時候，沒有給出一個相似Java的Compartor函數接口或比較函數，開發者給出了緣由見這裏：http://code.activestate.com/lists/python-list/162387/函數

因而，人們想出了一些很NB的思路，見：http://stackoverflow.com/questions/14189540/python-topn-max-heap-use-heapq-or-self-implement測試

我來歸納一種最簡單的：spa

將push(e)改成push(-e)、pop(e)改成-pop(e)。code

也就是說，在存入堆、從堆中取出的時候，都用相反數，而其餘邏輯與TopK徹底相同，看代碼：接口

view source

print ?

01 class BtmkHeap(object):

02     def __init__(self, k):

03         self.k = k

04         self.data = []

05

06     def Push(self, elem):

07         # Reverse elem to convert to max-heap

08         elem = -elem

09         # Using heap algorighem

10         if len(self.data) < self.k:

11             heapq.heappush(self.data, elem)

12         else:

13             topk_small = self.data[0]

14             if elem > topk_small:

15                 heapq.heapreplace(self.data, elem)

16

17     def BtmK(self):

18         return sorted([-x for x in self.data])

通過測試，是徹底沒有問題的，這思路太Trick了……ip