Reinforcement Learning:An Introduction Chapter 2 Multi-armed Bandits

文章目錄 Abstract 2.1 A k-armed Bandit Problem 2.2 Action-value Methods 2.3 The 10-armed Testbed 2.4 Incremental Implementation 2.5 Tracking a Nonstationary Problem 2.6 Optimistic Initial Values 2.7 Upper
相關文章
相關標籤/搜索