平均值和方差的遞推公式以及python實現

有時候在處理流式數據的時候,須要實時更新數據的統計值,如平均值和方差,若是經過傳統求解方差或者平均值時,每到達一個新的數據就須要遍從來求解。在數據量比較少的時候,經過遍歷和遞推求解的時間消耗和空間消耗並非很明顯,可是在大數據或者流式數據的應用場景下, O ( n ) O(n) O ( 1 ) O(1) 的時間複雜度以及空間複雜度的區別仍是很明顯的。html

均值公式: A n = 1 n i = 1 n X i A_{n} = \frac{1}{n}\sum^{n}_{i=1}X_{i}
均值遞推公式: A n = A n 1 + ( X n A n 1 ) n A_{n} = A_{n-1} + \frac{(X_{n} - A_{n-1})}{n}
方差公式: V n = 1 n i = 1 n ( X i A n ) V_{n} = \frac{1}{n}\sum_{i=1}^{n}(X_{i} - A_{n})
方差遞推公式: V n = n 1 n 2 ( X n A n 1 ) 2 + n 1 n V n 1 V_{n} = \frac{n-1}{n^{2}}(X_{n} - A_{n-1})^{2} + \frac{n-1}{n}V_{n-1} python

均值遞推公式能夠參考:https://blog.csdn.net/u014485485/article/details/77679669
方差遞推公式能夠參考:https://blog.csdn.net/wuqinlong/article/details/78432574web

python代碼:app

import numpy as np

class CalMeanVar():
    def __init__(self):
        self.count = 0
        self.A = 0
        self.A_ = 0
        self.V = 0

    def cal(self, data):
        self.count += 1
        if self.count == 1:
            self.A_ = data
            self.A = data
            return
        self.A_ = self.A
        self.A = self.A + (data - self.A) / self.count
        self.V = (self.count - 1) / self.count ** 2 * (data - self.A_)**2 + (self.count - 1)/self.count * self.V

if __name__ == '__main__':
    data = np.linspace(1,5, 5)
    print(data)
    print(data.mean())
    print(data.var())

    cmv = CalMeanVar()
    for i in range(len(data)):
        cmv.cal(data[i])
    print(cmv.A)
    print(cmv.V)