Reinforcement Learning Exercise 3.24

時間 2020-12-24

原文原文鏈接

Exercise 3.24 Figure 3.5 gives the optimal value of the best state of the gridworld as 24.4, to one decimal place. Use your knowledge of the optimal policy and (3.8) to express this value symbolically