I. Introduction
Artificial Intelligence has long had the goal of designing robotic agents that can interact with the (complex) physical world in flexible, data-efficient and generalizable ways [1], [2]. Model-based control methods form plans based on predefined models of the world dynamics. However, although data-efficient, these systems require accurate dynamics models, which may not exist for complex tasks. Model-free methods on the other hand rely on reinforcement learning, where the agents simultaneously learn a model of the world dynamics and a control policy [3], [4]. However, although these methods can learn policies to solve tasks involving complex dynamics, training these policies is inefficient, as they require many samples. Furthermore, these method are typically not generalizable beyond the trained scenarios.