Reinforcement Learning Exercise 3.24

Exercise 3.24 Figure 3.5 gives the optimal value of the best state of the gridworld as 24.4, to one decimal place. Use your knowledge of the optimal policy and (3.8) to express this value symbolically
相關文章
相關標籤/搜索