Abstract:
This paper investigates reinforcement learning algorithms for discrete-time stochastic multi-agent graphical games with multiplicative noise. The Bellman optimality equat...Show MoreMetadata
Abstract:
This paper investigates reinforcement learning algorithms for discrete-time stochastic multi-agent graphical games with multiplicative noise. The Bellman optimality equation for stochastic multi-agent graphical games is obtained by using the optimality principle. A Nash equilibrium can be reached when each agent executes a strategy in terms of Bellman optimality equation. To circumvent the difficulty of solving the coupled Bellman equation, a value iteration heuristic dynamic programming (HDP) algorithm is designed and its convergence is shown. To solve multi-agent graphical games online, the HDP algorithm based on the actor-critic framework is designed to approximate Nash equilibrium solutions. The effectiveness of the algorithm is verified by two numerical simulation examples.
Published in: IEEE Transactions on Circuits and Systems I: Regular Papers ( Early Access )