Reinforcement Learning——Chapter 2 Multi-armed Bandits

時間 2020-07-25

標籤 reinforcement learning chapter multi armed bandits 简体版

原文原文鏈接

1. Perface 強化學習與其餘學習方法最大的區別在於，強化學習 it uses training information that evaluates the actions taken rather than instructs by giving correct actions.html 1.1 A k-armed Bandit Problem 假設你面前有K個不一樣的選項，每一次選擇都

>>阅读原文<<