Python中的加權隨機

時間 2019-11-11

標籤 python 加權隨機欄目 Python 简体版

原文原文鏈接

咱們平時比較多會遇到的一種情景是從一堆的數據中隨機選擇一個, 大多數咱們使用random就夠了, 可是假如咱們要選取的這堆數據分別有本身的權重, 也就是他們被選擇的機率是不同的, 在這種狀況下, 就須要使用加權隨機來處理這些數據html

1. 簡單線性方法

下面是一種簡單的方案, 傳入權重的列表(weights), 而後會返回隨機結果的索引值(index), 好比咱們傳入[2, 3, 5], 那麼就會隨機的返回0(機率0.2), 1(機率0.3), 2(機率0.5)python

簡單的思路就是把全部的權重加和, 而後隨機一個數, 看看落在哪一個區間算法

import random

def weighted_choice(weights):
    totals = []
    running_total = 0

    for w in weights:
        running_total += w
        totals.append(running_total)

    rnd = random.random() * running_total
    for i, total in enumerate(totals):
        if rnd < total:
            return i

2. 加速搜索

上面這個方法看起來很是簡單, 已經能夠完成咱們所要的加權隨機, 然是最後的這個for循環貌似有些囉嗦, Python有個內置方法bisect能夠幫咱們加速這一步數組

import random
import bisect

def weighted_choice(weights):
    totals = []
    running_total = 0

    for w in weights:
        running_total += w
        totals.append(running_total)

    rnd = random.random() * running_total
    return bisect.bisect_right(totals, rnd)

bisect方法能夠幫咱們查找rnd在totals裏面應該插入的位置, 兩個方法看起來差很少, 可是第二個會更快一些, 取決於weights這個數組的長度, 若是長度大於1000, 大約會快30%左右app

3. 去掉臨時變量

其實在這個方法裏面totals這個數組並非必要的, 咱們調整下策略, 就能夠判斷出weights中的位置dom

def weighted_choice(weights):
  rnd = random.random() * sum(weights)
  for i, w in enumerate(weights):
      rnd -= w
      if rnd < 0:
          return i

這個方法比第二種方法居然快了一倍, 固然, 從算法角度角度, 複雜度是同樣的, 只不過咱們把賦值臨時變量的功夫省下來了, 其實若是傳進來的weights是已經按照從大到小排序好的話, 速度會更快, 由於rnd遞減的速度最快(先減去最大的數).net

4. 更多的隨機數

若是咱們使用同一個權重數組weights, 可是要屢次獲得隨機結果, 屢次的調用weighted_choice方法, totals變量仍是有必要的, 提早計算好它, 每次獲取隨機數的消耗會變得小不少code

class WeightedRandomGenerator(object):
    def __init__(self, weights):
        self.totals = []
        running_total = 0

        for w in weights:
            running_total += w
            self.totals.append(running_total)

    def next(self):
        rnd = random.random() * self.totals[-1]
        return bisect.bisect_right(self.totals, rnd)

    def __call__(self):
        return self.next()

在調用次數超過1000次的時候, WeightedRandomGenerator的速度是weighted_choice的100倍htm

因此咱們在對同一組權重列表進行屢次計算的時候選擇方法4, 若是少於100次, 則使用方法3blog

5. 使用accumulate

在python3.2以後, 提供了一個itertools.accumulate方法, 能夠快速的給weights求累積和

>>>> from itertools import accumulate
>>>> data  = [2, 3, 5, 10]
>>>> list(accumulate(data))
[2, 5, 10, 20]

若是你有更好的方法, 歡迎在留言區討論

參考文章: Weighted random generation in Python

本文發表在致趣技術團隊博客, 加入致趣

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。