Reducing Conservativeness in Safety Guarantees by Learning Disturbances Online: Iterated Guaranteed Safe Online Learning

Formats Non-Member Member
$15 $15
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, books, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)

Reinforcement learning has proven itself to be a powerful technique in robotics, however it has not often been employed to learn a controller in a hardware-in-the-loop environment due to the fact that spurious training data could cause a robot to take an unsafe (and potentially catastrophic) action. One approach to overcoming this limitation is known as Guaranteed Safe Online Learning via Reachability (GSOLR), in which the controller being learned is wrapped inside another controller based on reachability analysis that seeks to guarantee safety against worst-case disturbances. This paper proposes a novel improvement to GSOLR which we call Iterated Guaranteed Safe Online Learning via Reachability (IGSOLR), in which the worst-case disturbances are modeled in a state-dependent manner (either parametrically or nonparametrically), this model is learned online, and the safe sets are periodically recomputed (in parallel with whatever machine learning is being run online to learn how to control the system). As a result the safety of the system automatically becomes neither too liberal nor too conservative, depending only on the actual system behavior. This allows the machine learning algorithm running in parallel the widest possible latitude in performing its task while still guaranteeing system safety. In addition to explaining IGSOLR, we show how it was used in a real-world example, namely that of safely learning an altitude controller for a quadrotor helicopter. The resulting controller, which was learned via hardware-inthe- loop reinforcement learning, out-performs our original handtuned controller while still maintaining safety. To our knowledge, this is the first example in the robotics literature of an algorithm in which worst-case disturbances are learned online in order to guarantee system safety.