Reinforcement Learning_By David Silver筆記五: Model Free Control

(Optimise the value function of an unknown MDP) On-policy learning —— Learn about policy π from experience sampled from π Off-policy learning —— Learn about policy π from experience sampled from u On-
相關文章
相關標籤/搜索