Policy Iteration & Value Iteration

時間 2021-01-02

標籤 Policy Iteration Value Iteration 简体版

原文原文鏈接

值迭代的缺點：當多個策略有同樣的v(s)的時候，可能無法收斂，循環不會停止。 In Policy Iteration algorithms, you start with a random policy, then find the value function of that policy (policy evaluation step), then find an new (improved

>>阅读原文<<