By Topic

N-Grams and the Last-Good-Reply Policy Applied in General Game Playing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Mandy J. W. Tak ; Games and AI Group, Department of Knowledge Engineering, Faculty of Humanities and Sciences, Maastricht University, Maastricht, The Netherlands ; Mark H. M. Winands ; Yngvi Bjornsson

The aim of general game playing (GGP) is to create programs capable of playing a wide range of different games at an expert level, given only the rules of the game. The most successful GGP programs currently employ simulation-based Monte Carlo tree search (MCTS). The performance of MCTS depends heavily on the simulation strategy used. In this paper, we introduce improved simulation strategies for GGP that we implement and test in the GGP agent CADIAPLAYER, which won the International GGP competition in both 2007 and 2008. There are two aspects to the improvements: first, we show that a simple ϵ-greedy exploration strategy works better in the simulation play-outs than the softmax-based Gibbs measure currently used in CADIAPLAYER and, second, we introduce a general framework based on N-grams for learning promising move sequences. Collectively, these enhancements result in a much improved performance of CADIAPLAYER. For example, in our test suite consisting of five different two-player turn-based games, they led to an impressive average win rate of approximately 70%. The enhancements are also shown to be effective in multiplayer and simultaneous-move games. We additionally perform experiments with the last-good-reply policy (LGRP). The LGRP combined with N-grams is also tested. The LGRP has already been shown to be successful in Go programs and we demonstrate that it also has promise in GGP.

Published in:

IEEE Transactions on Computational Intelligence and AI in Games  (Volume:4 ,  Issue: 2 )