Abstract:
Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world pro...Show MoreMetadata
Abstract:
Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an issue in safe model-based RL, especially in training time safety. In this paper, we propose a distributional reachability certificate (DRC) and its Bellman equation to address model uncertainties and characterize robust persistently safe states. Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy. We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy. Comprehensive experiments on classical benchmarks such as constrained tracking and navigation indicate that the proposed algorithm achieves comparable returns with much fewer constraint violations during training. Our code is available at https://github.com/ManUtdMoon/Distributional-Reachability-Policy-Optimization. Note to Practitioners—Although it has been proven that RL can be applied in complex robotics control tasks, the training process of an RL control policy induces frequent failures because the agent needs to learn safety through constraint violations. This issue hinders the promotion of RL because a large amount of failure of robots is too expensive to afford. This paper aims to reduce the training-time violations of RL-based control methods, enabling RL to be leveraged in a broader application area. To achieve the goal, we first introduce a safety quantity describing the distribution of potential constraint violations in the long term. By imposing constraints on the quantile of the safety distribution, we can realize safety robust to the model uncertainty, which is necessary...
Published in: IEEE Transactions on Automation Science and Engineering ( Volume: 21, Issue: 3, July 2024)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Reachable ,
- Model-based Reinforcement Learning ,
- Training Time ,
- Model Uncertainty ,
- Sampling Efficiency ,
- Safe Conditions ,
- Bellman Equation ,
- State Constraints ,
- Constraint Violation ,
- Line Search ,
- Safety Policies ,
- Potential Violations ,
- Learning Models ,
- Optimization Problem ,
- Probabilistic Model ,
- Optimal Policy ,
- Cost Value ,
- Model Predictive Control ,
- Reward Function ,
- Constrained Optimization Problem ,
- Model-free Reinforcement Learning ,
- Reinforcement Learning Algorithm ,
- Model-based Algorithm ,
- Different Levels Of Uncertainty ,
- Episode Length ,
- True Dynamics ,
- Safety Constraints ,
- Robust Safety ,
- Constraint Satisfaction ,
- Robot Navigation
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Reachable ,
- Model-based Reinforcement Learning ,
- Training Time ,
- Model Uncertainty ,
- Sampling Efficiency ,
- Safe Conditions ,
- Bellman Equation ,
- State Constraints ,
- Constraint Violation ,
- Line Search ,
- Safety Policies ,
- Potential Violations ,
- Learning Models ,
- Optimization Problem ,
- Probabilistic Model ,
- Optimal Policy ,
- Cost Value ,
- Model Predictive Control ,
- Reward Function ,
- Constrained Optimization Problem ,
- Model-free Reinforcement Learning ,
- Reinforcement Learning Algorithm ,
- Model-based Algorithm ,
- Different Levels Of Uncertainty ,
- Episode Length ,
- True Dynamics ,
- Safety Constraints ,
- Robust Safety ,
- Constraint Satisfaction ,
- Robot Navigation
- Author Keywords