Abstract:
Recent research shows that simultaneous move games can be modeled as the imperfect information problem, more accurately in simulating the characteristics of simultaneous ...Show MoreMetadata
Abstract:
Recent research shows that simultaneous move games can be modeled as the imperfect information problem, more accurately in simulating the characteristics of simultaneous decision-making and gaining more favorable strategies. Furthermore, Monte Carlo Counterfactual Regret Minimization (MCCFR) is considered as a valid method for imperfect information. However, the convergence rate is seriously affected by sampling times and the high variance of estimations, which restricts the direct application to large simultaneous games. To address those challenges, we introduce an improved variant of MCCFR, namely Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization (OIO-MCCFR). OIO-MCCFR involves immediate rewards to orientate search. In addition, for reducing the excessive variance of estimation, control variate and state-action baseline are employed. Moreover, the new formulation has been proved to possess the probabilistic bound between the estimated unbiased regret and the accurate value. We evaluate OIO-MCCFR in the Goofspiel of diverse scales, which shows that our approach significantly outperforms vanilla MCCFR. More importantly, our experimental results also indicate that the larger the game scale, the more advantage of the OIO-MCCFR.
Published in: 2024 36th Chinese Control and Decision Conference (CCDC)
Date of Conference: 25-27 May 2024
Date Added to IEEE Xplore: 17 July 2024
ISBN Information: