Silver-Slides Chapter 2 - 強化學習之馬爾科夫決策過程 Markov Decision Process(MDP)

Markov Processes MDP被用來描述強化學習的可完全觀測的環境。幾乎所有的強化學習問題可以用MDP來描述,Optimal control primarily deals with continuous MDPs. Partially observable problems can be converted into MDPs. Bandits are MDPs with one st
相關文章
相關標籤/搜索