The awkward Bellman optimality equation in RL

通過博文2017 Fall CS294 Lecture 6: Actor-critic introduction,一文中插播的Reinforcement Learning: An introduction(Sutton1998)書中的一頁截圖,對於 Vπ(s) : the state-value function for policy π . Qπ(s,a) : the action-value
相關文章
相關標籤/搜索