MiniAlphaGo for Reversi
Reversi
Monte Carlo Tree Search
​Selection: Based on the UCB value, search for the leaf node L with the maximum return value.
On the first Monte Carlo search, the root node root\text{root}root is returned.
On subsequent searches, the node with the highest UCB1 value is selected and returned.
​
​Expansion: Expand the child nodes M of node L.
The legal positions for the current player to place a piece are added as child nodes of L.
Randomly expand one child node as M
The remaining child nodes are deferred for future searches since their UCB1 value is infinite.
​
Simulation: Simulate the game, recording the outcome and the number of pieces owned by each player.
Starting from node M, simulate both players making moves until a win/loss is determined.
​
Back Propagation: Backpropagate from node M.​
​ The backpropagation process maintains the UCB1 formula.
The values are adjusted with opposing effects for different players.
​