I. Introduction
With the recent development of deep reinforcement learning (RL) [1], we have witnessed many achievements in building intelligent agents to solve complex multiagent problems [2], [3]: AlphaStar achieves top professional-player-level performance in Starcraft II, [4], OpenAI Five defeats the world champion in Dota II [5], and RL is even used to address real-world applications in a simulated physical world, e.g., coordination of autonomous vehicles [6], traffic light control [7], formation control of unmanned aerial or ground vehicles [8], etc.