By Topic

The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Hendrik Baier ; Institute of Cognitive Science, University of Osnabr?ck, Osnabrück, Germany ; Peter D. Drake

The dominant paradigm for programs playing the game of Go is Monte Carlo tree search. This algorithm builds a search tree by playing many simulated games (playouts). Each playout consists of a sequence of moves within the tree followed by many moves beyond the tree. Moves beyond the tree are generated by a biased random sampling policy. The recently published last-good-reply policy makes moves that, in previous playouts, have been successful replies to immediately preceding moves. This paper presents a modification of this policy that not only remembers moves that recently succeeded but also immediately forgets moves that recently failed. This modification provides a large improvement in playing strength. We also show that responding to the previous two moves is superior to responding to the previous one move. Surprisingly, remembering the win rate of every reply performs much worse than simply remembering the last good reply (and indeed worse than not storing good replies at all).

Published in:

IEEE Transactions on Computational Intelligence and AI in Games  (Volume:2 ,  Issue: 4 )