Reinforcement Learning（二）：Value-Based

時間 2021-01-02

標籤強化學習简体版

原文原文鏈接

回顧一下action-value函數： Value-Based是指：但是一般來說，這個Q*我們是無從得出的，因此提出使用卷積網絡來近似： Deep Q-Network (DQN) Approximate the Q Function Deep Q Network (DQN) Apply DQN to Play Game Temporal Difference (TD) Learning 一個小例

>>阅读原文<<