Generative Adversarial Imitation Learning 論文簡析

時間 2021-01-02

原文原文鏈接

《Generative Adversarial Imitation Learning》2016 1、幾個概念：（1） occupancy measure ρπ(s,a)：（2）cost function C(s,a), π策略下的累計回報：（3）causal entropy: (4) 學徒學習公式（5）用RTPO來進行策略更新，保證每個策略更新後前後兩個策略的差距

>>阅读原文<<