Multi-time scale hierarchical trust domain leads to the improvement of MAPPO algorithm | IEEE Conference Publication | IEEE Xplore

Multi-time scale hierarchical trust domain leads to the improvement of MAPPO algorithm


Abstract:

Multi-agent Proximal Policy Optimization is a ubiquitous on-policy reinforcement learning algorithm, but its usage is significantly lower than that of off-policy learning...Show More

Abstract:

Multi-agent Proximal Policy Optimization is a ubiquitous on-policy reinforcement learning algorithm, but its usage is significantly lower than that of off-policy learning algorithms in multi-agent environments. The existing MAPPO algorithm has the problem of insufficient generalization ability, adaptability and training stability when dealing with complex tasks. In this paper, we propose an improved trust domain guided MAPPO algorithm with multi-time scale hierarchical structure, which aims to cope with the dynamic changes of hierarchical structure and multi-time scale of tasks. A multi-time scale hierarchical structure is introduced by the algorithm, along with trust domain constraints and L2 norm regularization to prevent the instability of policy performance caused by too large updates. Finally, through the experimental verification of Decentralized Collective Assault (DCA), our algorithm has achieved significant improvements in various performance indicators, indicating that it has better effect and robustness in dealing with complex tasks.
Date of Conference: 28-31 July 2024
Date Added to IEEE Xplore: 17 September 2024
ISBN Information:

ISSN Information:

Conference Location: Kunming, China

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.