Reinforcement Learning(四):Actor-Critic Methods

主要思想: Policy Network (Actor) Value Network (Critic): 形象對比: Train the Neural Networks 具體步驟: Update value network q using TD Update policy network Π using policy gradient Actor-Critic Method Summary of
相關文章
相關標籤/搜索