深度增強學習David Silver（四）——Model-Free Prediction

時間 2020-12-31

原文原文鏈接

本節課主要介紹： Monte-Carlo Learning Temporal-Difference Learning TD(λ) Lecture03講到了已知環境的MDP，也就是做出行動之後知道到達哪個狀態及獎勵，但是現實中大部分情況下狀態和獎勵是未知的，這種情況稱爲model-free，即環境模型未知。本節課探討prediction，估計未知環境的MDP的價值函數，下節課講control。 Mo

>>阅读原文<<