需1求:給出N長的序列,求出TopK大的元素,使用小頂堆,heapq模塊實現。程序員
01 |
import heapq |
02 |
import random |
03 |
04 |
class TopkHeap( object ): |
05 |
def __init__( self , k): |
06 |
self .k = k |
07 |
self .data = [] |
08 |
09 |
def Push( self , elem): |
10 |
if len ( self .data) < self .k: |
11 |
heapq.heappush( self .data, elem) |
12 |
else : |
13 |
topk_small = self .data[ 0 ] |
14 |
if elem > topk_small: |
15 |
heapq.heapreplace( self .data, elem) |
16 |
17 |
def TopK( self ): |
18 |
return [x for x in reversed ([heapq.heappop( self .data) for x in xrange ( len ( self .data))])] |
19 |
20 |
if __name__ = = "__main__" : |
21 |
print "Hello" |
22 |
list_rand = random.sample( xrange ( 1000000 ), 100 ) |
23 |
th = TopkHeap( 3 ) |
24 |
for i in list_rand: |
25 |
th.Push(i) |
26 |
print th.TopK() |
27 |
print sorted (list_rand, reverse = True )[ 0 : 3 ] |
上面的用heapq就能輕鬆搞定。app
變態的需求來了:給出N長的序列,求出BtmK小的元素,即便用大頂堆。dom
heapq在實現的時候,沒有給出一個相似Java的Compartor函數接口或比較函數,開發者給出了緣由見這裏:http://code.activestate.com/lists/python-list/162387/函數
因而,人們想出了一些很NB的思路,見:http://stackoverflow.com/questions/14189540/python-topn-max-heap-use-heapq-or-self-implement測試
我來歸納一種最簡單的:spa
將push(e)改成push(-e)、pop(e)改成-pop(e)。code
也就是說,在存入堆、從堆中取出的時候,都用相反數,而其餘邏輯與TopK徹底相同,看代碼:接口
01 |
class BtmkHeap( object ): |
02 |
def __init__( self , k): |
03 |
self .k = k |
04 |
self .data = [] |
05 |
06 |
def Push( self , elem): |
07 |
# Reverse elem to convert to max-heap |
08 |
elem = - elem |
09 |
# Using heap algorighem |
10 |
if len ( self .data) < self .k: |
11 |
heapq.heappush( self .data, elem) |
12 |
else : |
13 |
topk_small = self .data[ 0 ] |
14 |
if elem > topk_small: |
15 |
heapq.heapreplace( self .data, elem) |
16 |
17 |
def BtmK( self ): |
18 |
return sorted ([ - x for x in self .data]) |
通過測試,是徹底沒有問題的,這思路太Trick了……ip