top of page

MiniAlphaGo for Reversi

Reversi

Monte Carlo Tree Search

​Selection: Based on the UCB value, search for the leaf node L with the maximum return value.

        On the first Monte Carlo search, the root node root\text{root}root is returned.

        On subsequent searches, the node with the highest UCB1 value is selected and returned.

​

​Expansion: Expand the child nodes M of node L.

        The legal positions for the current player to place a piece are added as child nodes of L.
        Randomly expand one child node as M

        The remaining child nodes are deferred for future searches since their UCB1 value is infinite.

​

Simulation: Simulate the game, recording the outcome and the number of pieces owned by each player.

        Starting from node M, simulate both players making moves until a win/loss is determined.

​

Back Propagation: Backpropagate from node M.​

​        The backpropagation process maintains the UCB1 formula.
        The values are adjusted with opposing effects for different players.

​

en.wikipedia.org/wiki/Monte_Carlo_tree_search

© 2035 by Leo Zheng

bottom of page