Skip to Main Content
Reinforcement learning has proven itself to be a powerful technique in robotics, however it has not often been employed to learn a controller in a hardware-in-the-loop environment due to the fact that spurious training data could cause a robot to take an unsafe (and potentially catastrophic) action. One approach to overcoming this limitation is known as Guaranteed Safe Online Learning via Reachability (GSOLR), in which the controller being learned is wrapped inside another controller based on reachability analysis that seeks to guarantee safety against worst-case disturbances. This paper proposes a novel improvement to GSOLR which we call Iterated Guaranteed Safe Online Learning via Reachability (IGSOLR), in which the worst-case disturbances are modeled in a state-dependent manner (either parametrically or nonparametrically), this model is learned online, and the safe sets are periodically recomputed (in parallel with whatever machine learning is being run online to learn how to control the system). As a result the safety of the system automatically becomes neither too liberal nor too conservative, depending only on the actual system behavior. This allows the machine learning algorithm running in parallel the widest possible latitude in performing its task while still guaranteeing system safety. In addition to explaining IGSOLR, we show how it was used in a real-world example, namely that of safely learning an altitude controller for a quadrotor helicopter. The resulting controller, which was learned via hardware-inthe- loop reinforcement learning, out-performs our original handtuned controller while still maintaining safety. To our knowledge, this is the first example in the robotics literature of an algorithm in which worst-case disturbances are learned online in order to guarantee system safety.