Skip to Main Content
Many controlled systems must operate over a range of external conditions. In this paper, we focus on the problem of learning a policy to adapt a system's controller based on the value of these external conditions in order to always perform well (i.e., maximize system output). In addition, we are concerned with systems for which it is expensive to run experiments, and therefore restrict the number that can be run during training. We formally define the problem setup and the notion of an optimal control policy. We propose two algorithms which aim to find such a policy while minimizing the number of system output evaluations. We present results comparing these algorithms and various other approaches and discuss the inherent tradeoffs in the proposed algorithms. Finally, we use these methods to train both simulated and physical snake robots to automatically adapt to changing terrain, and demonstrate improved performance on test courses with changing environments.