I. Introduction
Reinforcement learning (RL) has been widely advocated in applications of sequential-decision making under uncertainty. One great challenge in applying RL algorithms to practical systems is that usually the systems involve more than one decision-maker, i.e., multiple agents that interact with each other. This multi -agent setting finds broad applications in practical control systems, including the power grid [1], robotics [2], and unmanned vehicles [3]. In this work, we focus on developing RL algorithms for such a setting, i.e., the problem of multi-agent reinforcement learning (MARL).