I. Introduction
Robotics at large has been improving at a rapid pace, and this has resulted in increased demand in applications ranging from manufacturing [1], to warehousing [2], to human-populated environments [3]. However, despite the clear potential for distributed controllers that leverage cooperation between multiple mobile robots (e.g., the cooperative transport of large or heavy objects, see Fig. 1b), the vast majority of existing techniques are either limited to single-robot operation or require that each robot perceive the complete state of the environment [4]. Interestingly enough, model-free learning-based methods present a promising alternative to traditional model-based control, in that they are less reliant on domain knowledge such as kinematic and dynamic modeling of the system, and that they scale more naturally with the number of agents.