DRL筆記系列一

時間 2021-01-22

標籤 DRL 算法简体版

原文原文鏈接

參考鏈接基本概念 trial and error DRL=RL+deep_learning on-policy：所有數據都是當前agent與env交互後產生的，訓練時不使用old data，即不使用以前agent產生的數據缺點：these algorithms works weaker on sample efficiency 優點：these algorithms directly opti

>>阅读原文<<