I. Introduction
The energy transition towards a more sustainable, secure and affordable energy supply (the German Energiewende[1]) consisting of a high share of renewable energy sources (RES) increases the energy system's complexity. It creates an energy system in a more decentralized pattern with many more participants. Single households may be part of the energy system not only consuming electricity but also becoming a prosumer by installing rooftop photo-voltaic (PV) systems. RES such as PV and wind power plants are intermittent in their energy generation. With increasing share of RES the challenge of projectable energy generation exacerbates. In a future energy system with many decentralized generation units it is desirable to level out energy supply and demand on a local level. The more complex the energy system becomes, the harder it is to control such a system. New control algorithms are needed to account for such a complex environment. Reinforcement Learning (RL), as a proven approach to handle those dynamic uncertain systems, have already found its way to improve sequential decision-making in several domains such as robotics and self-driving cars [2]. In literature several research works have been conducted concerning single-agent RL in the energy domain such as improving the decision-making in microgrid control [3]. A multi-agent setup of Reinforcement Learning in the energy domain is rarely found in literature.