Mastering the game of Go without human knowledge (AlphaGo Zero)

時間 2020-12-21

原文原文鏈接

AlphaGo的樹搜索結合了深度神經網絡，這些網絡是由專家知識進行監督式學習以及從selfplay中進行強化學習。AlphaGo Zero僅基於強化學習，一個神經網絡被訓練來預測行爲的選擇和價值。該神經網絡提高了樹搜索的性能，從而在下一次迭代中提供了更高質量的移動選擇和更強的自我玩法，同時更精確的樹搜索又能改善網絡性能。文章目錄 Introduction Reinforcement learni

>>阅读原文<<