RL策略梯度方法之(四): Asynchronous Advantage Actor-Critic（A3C）

時間 2020-12-30

原文原文鏈接

本專欄按照 https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html 順序進行總結。文章目錄原理解析算法實現總體流程代碼實現 A 3 C \color{red}A3C A3C ：[ paper | code ] 原理解析在A3C中，critic 學習值函數，同時多個 actor 並行

>>阅读原文<<