深度加強學習David Silver（四）——Model-Free Prediction

時間 2019-12-11

標籤深度加強學習 david silver model free prediction 简体版

原文原文鏈接

本節課主要介紹：web Monte-Carlo Learning Temporal-Difference Learning TD(λ) Lecture03講到了已知環境的MDP，也就是作出行動以後知道到達哪一個狀態及獎勵，可是現實中大部分狀況下狀態和獎勵是未知的，這種狀況稱爲model-free，即環境模型未知。本節課探討prediction，估計未知環境的MDP的價值函數，下節課講control

>>阅读原文<<