Loading [MathJax]/extensions/MathMenu.js
Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization for Simultaneous Games | IEEE Conference Publication | IEEE Xplore

Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization for Simultaneous Games


Abstract:

Recent research shows that simultaneous move games can be modeled as the imperfect information problem, more accurately in simulating the characteristics of simultaneous ...Show More

Abstract:

Recent research shows that simultaneous move games can be modeled as the imperfect information problem, more accurately in simulating the characteristics of simultaneous decision-making and gaining more favorable strategies. Furthermore, Monte Carlo Counterfactual Regret Minimization (MCCFR) is considered as a valid method for imperfect information. However, the convergence rate is seriously affected by sampling times and the high variance of estimations, which restricts the direct application to large simultaneous games. To address those challenges, we introduce an improved variant of MCCFR, namely Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization (OIO-MCCFR). OIO-MCCFR involves immediate rewards to orientate search. In addition, for reducing the excessive variance of estimation, control variate and state-action baseline are employed. Moreover, the new formulation has been proved to possess the probabilistic bound between the estimated unbiased regret and the accurate value. We evaluate OIO-MCCFR in the Goofspiel of diverse scales, which shows that our approach significantly outperforms vanilla MCCFR. More importantly, our experimental results also indicate that the larger the game scale, the more advantage of the OIO-MCCFR.
Date of Conference: 25-27 May 2024
Date Added to IEEE Xplore: 17 July 2024
ISBN Information:

ISSN Information:

Conference Location: Xi'an, China

Funding Agency:


I. Introduction

Simultaneous games have been a challenging topic of artificial intelligence, particularly significant in non-cooperative games [1], [2]. The model of them can be instanced by many well-known games, including Goofspiel, StarCraft, and pursuit-evasion games, etc. Compared with most of the algorithms in solving the simultaneous game, the counterfactual regret minimization (CFR) can simulate characteristics of simultaneous decision-making accurately, over time, with high probability to find the optimal solution [3], [4].

Contact IEEE to Subscribe

References

References is not available for this document.