Active Reward Learning from Critiques | IEEE Conference Publication | IEEE Xplore

Scheduled Maintenance: On Tuesday, May 20, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (6:00-10:00 PM UTC). During this time, there may be intermittent impact on performance. We apologize for any inconvenience.

Active Reward Learning from Critiques


Abstract:

Learning from demonstration algorithms, such as Inverse Reinforcement Learning, aim to provide a natural mechanism for programming robots, but can often require a prohibi...Show More

Abstract:

Learning from demonstration algorithms, such as Inverse Reinforcement Learning, aim to provide a natural mechanism for programming robots, but can often require a prohibitive number of demonstrations to capture important subtleties of a task. Rather than requesting additional demonstrations blindly, active learning methods leverage uncertainty to query the user for action labels at states with high expected information gain. However, this approach can still require a large number of labels to adequately reduce uncertainty and may also be unintuitive, as users are not accustomed to determining optimal actions in a single out-of-context state. To address these shortcomings, we propose a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that (1) queries the user for critiques of automatically generated trajectories, rather than asking for demonstrations or action labels, (2) utilizes trajectory segmentation to expedite the critique / labeling process, and (3) predicts the user's critiques to generate the most highly informative trajectory queries. We evaluated our algorithm in simulated domains, finding it to compare favorably to prior work and a randomized baseline.
Date of Conference: 21-25 May 2018
Date Added to IEEE Xplore: 13 September 2018
ISBN Information:
Electronic ISSN: 2577-087X
Conference Location: Brisbane, QLD, Australia

Contact IEEE to Subscribe

References

References is not available for this document.