基於Monte Carlo方法的2048 A.I.

時間 2019-11-12

標籤基於 monte carlo 方法 a.i 简体版

原文原文鏈接

2048 A.I. 在 stackoverflow 上有個討論：http://stackoverflow.com/questions/22342854/what-is-the-optimal-algorithm-for-the-game-2048html

得票最高的回答是基於 Min-Max-Tree + alpha beta 剪枝，啓發函數的設計很優秀。node

其實也能夠不用設計啓發函數就寫出 A.I. 的，我用的方法是圍棋 A.I. 領域的經典算法——Monte Carlo 局面評估 + UCT 搜索。git

算法的介紹見我幾年前寫的一篇博文：http://www.cnblogs.com/qswang/archive/2011/08/28/2360489.htmlgithub

簡而言之就兩點：算法

經過隨機遊戲評估給定局面的得分；
從博弈樹的父節點往下選擇子節點時，綜合考慮子節點的歷史得分與嘗試次數。

針對2048遊戲，我對算法作了一個改動——把 Minx-Max-Tree 改成 Random-Max-Tree，由於增長數字是隨機的，而不是理性的博弈方，因此猜測 Min-Max-Tree 容易傾向過度保守的博弈策略，而不敢追求更大的成果。dom

UCT搜索的代碼：函數

Orientation UctPlayer::NextMove(const FullBoard& full_board) const {
  int mc_count = 0;
  while (mc_count < kMonteCarloGameCount) {
    FullBoard current_node;
    Orientation orientation = MaxUcbMove(full_board);
    current_node.Copy(full_board);
    current_node.PlayMovingMove(orientation);
    NewProfit(&current_node, &mc_count);
  }

  return BestChild(full_board);
}

NewProfit函數用於更新該節點到某葉子節點的記錄，是遞歸實現的：ui

float UctPlayer::NewProfit(board::FullBoard *node,
    int* mc_count) const {
  float result;
  HashKey hash_key = node->ZobristHash();
  auto iterator = transposition_table_.find(hash_key);
  if (iterator == transposition_table_.end()) {
    FullBoard copied_node;
    copied_node.Copy(*node);
    MonteCarloGame game(move(copied_node));

    if (!HasGameEnded(*node)) game.Run();

    result = GetProfit(game.GetFullBoard());
    ++(*mc_count);
    NodeRecord node_record(1, result);
    transposition_table_.insert(make_pair(hash_key, node_record));
  } else {
    NodeRecord *node_record = &(iterator->second);
    int visited_times = node_record->VisitedTimes();
    if (HasGameEnded(*node)) {
      ++(*mc_count);
      result = node_record->AverageProfit();
    } else {
      AddingNumberRandomlyPlayer player;
      AddingNumberMove move = player.NextMove(*node);
      node->PlayAddingNumberMove(move);
      Orientation max_ucb_move = MaxUcbMove(*node);
      node->PlayMovingMove(max_ucb_move);
      result = NewProfit(node, mc_count);
      float previous_profit = node_record->AverageProfit();
      float average_profit = (previous_profit * visited_times + result) /
          (visited_times + 1);
      node_record->SetAverageProfit(average_profit);
    }

    node_record->SetVisitedTimes(visited_times + 1);
  }

  return result;
}

起初用結局的最大數字做爲得分，後來發現當跑到512後，Monte Carlo棋局的結果並不會出現更大的數字，各個節點變得沒有區別。因而做了改進，把移動次數做爲得分，大爲改善。this

整個程序的設計分爲 board、player、game 三大模塊，board 負責棋盤邏輯，player 負責移動或增長數字的邏輯，game把board和player連起來。spa

Game類的聲明以下：

class Game {
public:
  typedef std::unique_ptr<player::AddingNumberPlayer>
  AddingNumberPlayerUniquePtr;
  typedef std::unique_ptr<player::MovingPlayer> MovingPlayerUniquePtr;

  Game(Game &&game) = default;

  virtual ~Game();

  const board::FullBoard& GetFullBoard() const {
    return full_board_;
  }

  void Run();

protected:
  Game(board::FullBoard &&full_board,
      AddingNumberPlayerUniquePtr &&adding_number_player,
      MovingPlayerUniquePtr &&moving_player);

  virtual void BeforeAddNumber() const {
  }

  virtual void BeforeMove() const {
  }

private:
  board::FullBoard full_board_;
  AddingNumberPlayerUniquePtr adding_number_player_unique_ptr_;
  MovingPlayerUniquePtr moving_player_unique_ptr_;

  DISALLOW_COPY_AND_ASSIGN(Game);
};

Run函數的實現：

void Game::Run() {
  while (!HasGameEnded(full_board_)) {
    if (full_board_.LastForce() == Force::kMoving) {
      BeforeAddNumber();

      AddingNumberMove
      move = adding_number_player_unique_ptr_->NextMove(full_board_);
      full_board_.PlayAddingNumberMove(move);
    } else {
      BeforeMove();

      Orientation orientation =
          moving_player_unique_ptr_->NextMove(full_board_);
      full_board_.PlayMovingMove(orientation);
    }
  }
}

這樣就能夠經過繼承 Game 類，實現不一樣的構造函數，組合出不一樣的 Game，好比 MonteCarloGame 的構造函數：

MonteCarloGame::MonteCarloGame(FullBoard &&full_board) :
    Game(move(full_board),
    std::move(Game::AddingNumberPlayerUniquePtr(
    new AddingNumberRandomlyPlayer)),
    std::move(Game::MovingPlayerUniquePtr(new MovingRandomlyPlayer))) {}

一個新的2048棋局，會先放上兩個數字，新棋局應該能方便地build。默認應該隨機地增長兩個數字，builder 類能夠這麼寫：

template<class G>
class NewGameBuilder {
public:
  NewGameBuilder();
  ~NewGameBuilder() = default;

  NewGameBuilder& SetLastForce(board::Force last_force);

  NewGameBuilder& SetAddingNumberPlayer(game::Game::AddingNumberPlayerUniquePtr
      &&initialization_player);

  G Build() const;

private:
  game::Game::AddingNumberPlayerUniquePtr initialization_player_;
};

template<class G>
NewGameBuilder<G>::NewGameBuilder() :
    initialization_player_(game::Game::AddingNumberPlayerUniquePtr(
    new player::AddingNumberRandomlyPlayer)) {
}

template<class G>
NewGameBuilder<G>& NewGameBuilder<G>::SetAddingNumberPlayer(
    game::Game::AddingNumberPlayerUniquePtr &&initialization_player) {
  initialization_player_ = std::move(initialization_player);
  return *this;
}

template<class G>
G NewGameBuilder<G>::Build() const {
  board::FullBoard full_board;

  for (int i = 0; i < 2; ++i) {
    board::AddingNumberMove move = initialization_player_->NextMove(full_board);
    full_board.PlayAddingNumberMove(move);
  }

  return G(std::move(full_board));
}

好久之前，高效的 C++ 代碼不提倡在函數中 return 靜態分配內存的對象，如今有了右值引用就方便多了。

main 函數：

int main() {
  InitLogConfig();
  AutoGame game = NewGameBuilder<AutoGame>().Build();
  game.Run();
}

./fool2048：

這個A.I.的移動不像基於人爲設置啓發函數的A.I.那麼有規則，不會把最大的數字固定在角落，但最後也能有相對不錯的結果，遊戲過程更具觀賞性~

項目地址：https://github.com/chncwang/fool2048

最後發個招聘連接：http://www.kujiale.com/about/join

我這塊的工做主要是站內搜索、推薦算法等，歡迎牛人投簡歷到hr郵箱~

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。