Abstract:
Multi-agent Proximal Policy Optimization is a ubiquitous on-policy reinforcement learning algorithm, but its usage is significantly lower than that of off-policy learning...Show MoreMetadata
Abstract:
Multi-agent Proximal Policy Optimization is a ubiquitous on-policy reinforcement learning algorithm, but its usage is significantly lower than that of off-policy learning algorithms in multi-agent environments. The existing MAPPO algorithm has the problem of insufficient generalization ability, adaptability and training stability when dealing with complex tasks. In this paper, we propose an improved trust domain guided MAPPO algorithm with multi-time scale hierarchical structure, which aims to cope with the dynamic changes of hierarchical structure and multi-time scale of tasks. A multi-time scale hierarchical structure is introduced by the algorithm, along with trust domain constraints and L2 norm regularization to prevent the instability of policy performance caused by too large updates. Finally, through the experimental verification of Decentralized Collective Assault (DCA), our algorithm has achieved significant improvements in various performance indicators, indicating that it has better effect and robustness in dealing with complex tasks.
Published in: 2024 43rd Chinese Control Conference (CCC)
Date of Conference: 28-31 July 2024
Date Added to IEEE Xplore: 17 September 2024
ISBN Information: