Loading [a11y]/accessibility-menu.js
Gumbel MuZero for the Game of 2048 | IEEE Conference Publication | IEEE Xplore

Gumbel MuZero for the Game of 2048


Abstract:

In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero a...Show More

Abstract:

In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero also learns the game rules and environment's dynamics without the access to a simulator during planning, which makes it applicable to complex environments. Both algorithms adopt Monte Carlo tree search (MCTS) during self-play, usually using hundreds of simulations for one move. For stochasticity, Stochastic MuZero was proposed to learn a stochastic model and uses the learned model to perform the tree search. Recently, Gumbel MuZero was proposed to ensure the policy improvement and can thus learn reliably with a small number of simulations. However, Gumbel MuZero used a deterministic model as in MuZero, limiting its performance in stochastic environments. In this paper, we propose to combine Gumbel MuZero and Stochastic MuZero, the first attempt to apply Gumbel MuZero to a stochastic environment. Our experiment on the stochastic puzzle game 2048 demonstrates that the combined algorithm can perform well and achieve an average score of 394,645 with only 3 simulations during training, greatly reducing the computational resource needed for training.
Date of Conference: 01-03 December 2022
Date Added to IEEE Xplore: 08 March 2023
ISBN Information:

ISSN Information:

Conference Location: Tainan, Taiwan

Contact IEEE to Subscribe

References

References is not available for this document.