Loading [MathJax]/extensions/MathZoom.js
Coding for Distributed Multi-Agent Reinforcement Learning | IEEE Conference Publication | IEEE Xplore

Coding for Distributed Multi-Agent Reinforcement Learning


Abstract:

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a...Show More

Abstract:

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient (MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS) code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.
Date of Conference: 30 May 2021 - 05 June 2021
Date Added to IEEE Xplore: 18 October 2021
ISBN Information:

ISSN Information:

Conference Location: Xi'an, China

Funding Agency:


I. Introduction

Many real-life applications involve interaction among multiple intelligent systems, such as collaborative robot teams [1], internet-of-things devices [2], agents in cooperative or competitive games [3], and traffic management devices [4]. Reinforcement learning (RL) [5] is an effective tool to optimize the behavior of intelligent agents in such applications based on reward signals from interaction with the environment. Traditional RL algorithms, such as Q-Learning [6] and policy gradient [3], can be scaled to multiple agents by simultaneous application to each individual agent. However, learning independently for each agent performs poorly as the environment is non-stationary from the perspective of a single agent due to the actions of the other agents [3], [7]. Multi-agent reinforcement learning (MARL) [6] focuses on mitigating these challenges by adding other agents’ policy parameters to the Q function [8] or relying on importance sampling [9]. Yang et al. [10] propose a mean-field Q learning algorithm, which uses Q functions defined only with respect to an agent’s own action and those of its neighbors instead of all agent actions. The multi-agent deep deterministic policy gradient (MADDPG) [3] is an extension of the deep deterministic policy gradient (DDPG) algorithm [11] to a multi-agent setting. MADDPG uses a Q function that depends on all agent observations and actions but local control policies, defined over the observation and action of an individual agent. One key challenge faced by MARL approaches is that the training computational complexity scales with the number of agents in the environment. For large-scale MARL applications, the traditional centralized training mechanism that runs in a single compute node could thus be cost-prohibitive.

Contact IEEE to Subscribe

References

References is not available for this document.