multi-arm-bandits問題python代碼

假設有k=10個搖臂的老虎機,其獎勵分佈滿足高斯正態分佈,每個搖臂對應的正態分佈的均值與方差分別爲: #the real mean value of each ation's reward qa_star = np.array([0.2,-0.3,1.5,0.5,1.2,-1.6,-0.2,-1,1.1,-0.6]) #the vars of each action's reward var_qa
相關文章
相關標籤/搜索