The awkward Bellman optimality equation in RL

時間 2020-12-24

原文原文鏈接

通過博文2017 Fall CS294 Lecture 6: Actor-critic introduction，一文中插播的Reinforcement Learning: An introduction(Sutton1998)書中的一頁截圖，對於 Vπ(s) : the state-value function for policy π . Qπ(s,a) : the action-value