強化學習-An introduction之 On-policy Prediction with Function Approximation 個人筆記

時間 2021-01-08

原文原文鏈接

Chapter 9 On-policy Prediction with Approximation 我們之前講過，強化學習的主要目標是學習value function，即狀態到動作的映射。之前的方法都是tabular methods，即用一個table/array來記錄state對應的action或者value，但是這樣就有一個缺點需要的內存非常大，因此我們考慮用有參數的函數來表示value fu

>>阅读原文<<