Abstract:
Policy Gradient methods require many real-world trials. Some of the trials may endanger the robot system and cause its rapid wear. Therefore, a safe or at least gentle-to...Show MoreMetadata
Abstract:
Policy Gradient methods require many real-world trials. Some of the trials may endanger the robot system and cause its rapid wear. Therefore, a safe or at least gentle-to-wear exploration is a desired property. We incorporate bounds on the probability of unwanted trials into the recent Contextual Relative Entropy Policy Search method. The proposed algorithm is evaluated on the task of autonomous flipper control for a real Search and Rescue rover platform.
Date of Conference: 09-14 October 2016
Date Added to IEEE Xplore: 01 December 2016
ISBN Information:
Electronic ISSN: 2153-0866