Journals & Magazines >IEEE Transactions on Automati... >Volume: 21 Issue: 3

Safe Model-Based Reinforcement Learning With an Uncertainty-Aware Reachability Certificate

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world pro...Show More

Metadata

Abstract:

Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an issue in safe model-based RL, especially in training time safety. In this paper, we propose a distributional reachability certificate (DRC) and its Bellman equation to address model uncertainties and characterize robust persistently safe states. Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy. We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy. Comprehensive experiments on classical benchmarks such as constrained tracking and navigation indicate that the proposed algorithm achieves comparable returns with much fewer constraint violations during training. Our code is available at https://github.com/ManUtdMoon/Distributional-Reachability-Policy-Optimization. Note to Practitioners—Although it has been proven that RL can be applied in complex robotics control tasks, the training process of an RL control policy induces frequent failures because the agent needs to learn safety through constraint violations. This issue hinders the promotion of RL because a large amount of failure of robots is too expensive to afford. This paper aims to reduce the training-time violations of RL-based control methods, enabling RL to be leveraged in a broader application area. To achieve the goal, we first introduce a safety quantity describing the distribution of potential constraint violations in the long term. By imposing constraints on the quantile of the safety distribution, we can realize safety robust to the model uncertainty, which is necessary...

Published in: IEEE Transactions on Automation Science and Engineering ( Volume: 21, Issue: 3, July 2024)

Page(s): 4129 - 4142

Date of Publication: 27 November 2023

ISSN Information:

DOI: 10.1109/TASE.2023.3292388

Funding Agency:

Dongjie Yu

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Dongjie Yu received the B.E. and M.S. degrees from Tsinghua University, Beijing, China, in 2020 and 2023, respectively.

His current research interests include safe reinforcement learning and its application in the decision-making, control of robotics, and autonomous driving.

Dongjie Yu received the B.E. and M.S. degrees from Tsinghua University, Beijing, China, in 2020 and 2023, respectively.

His current research interests include safe reinforcement learning and its application in the decision-making, control of robotics, and autonomous driving.View more

Wenjun Zou

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Wenjun Zou received the B.E. degree in automotive engineering from Tsinghua University, Beijing, China, in 2020, where he is currently pursuing the Ph.D. degree in mechanical engineering.

His current research interests include decision and control of autonomous vehicles and reinforcement learning.

Wenjun Zou received the B.E. degree in automotive engineering from Tsinghua University, Beijing, China, in 2020, where he is currently pursuing the Ph.D. degree in mechanical engineering.

His current research interests include decision and control of autonomous vehicles and reinforcement learning.View more

Yujie Yang

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Yujie Yang received the B.E. degree in automotive engineering from Tsinghua University, Beijing, China, in 2021, where he is currently pursuing the Ph.D. degree with the School of Vehicle and Mobility.

His research interests include decision and control of autonomous vehicles, reinforcement learning, and optimal control.

His research interests include decision and control of autonomous vehicles, reinforcement learning, and optimal control.View more

Haitong Ma

John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA

Haitong Ma received the B.S. and M.S. degrees in vehicle engineering from Tsinghua University in 2019 and 2022, respectively. He is currently pursuing the Ph.D. degree with the John A. Paulson School of Engineering and Applied Sciences (SEAS), Harvard University.

His research interest lies in the intersection of control theory and machine learning. He received the Outstanding Master’s Graduate and Master’s Thesis of Tsingh...Show More

His research interest lies in the intersection of control theory and machine learning. He received the Outstanding Master’s Graduate and Master’s Thesis of Tsingh...View more

Shengbo Eben Li

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Shengbo Eben Li (Senior Member, IEEE) received the M.S. and Ph.D. degrees from Tsinghua University in 2006 and 2009, respectively. He was with Stanford University, the University of Michigan, and the University of California at Berkeley. He is currently a tenured Professor with Tsinghua University. He is the author of over 100 journals/conference papers and the co-inventor of over 20 Chinese patents. His research interest...Show More

Yuming Yin

College of Mechanical Engineering, Zhejiang University of Technology, Zhejiang, Hangzhou, China

Jianyu Chen

Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China

Jianyu Chen received the bachelor’s degree from Tsinghua University in 2015 and the Ph.D. degree from the University of California at Berkeley in 2020. He has been an Assistant Professor with the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, since 2020. Prior to that, he was working with Prof. Masayoshi Tomizuka with the University of California at Berkeley. He is working in the cross f...Show More

Jingliang Duan

School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China

Jingliang Duan received the Ph.D. degree from the School of Vehicle and Mobility, Tsinghua University, China, in 2021.

Contents

I. Introduction

Reinforcement learning (RL) has achieved success across different automated control tasks such as robotics locomotion [1], navigation [2] and transportation management [3], [4]. However, the trial-and-error process of RL hinders its application in more real-world tasks due to the large number of failures brought by unconstrained policies, threatening the safety of users and systems. Therefore, safe RL [5] is proposed to impose constraints on agents and enhance the safety of policies both after convergence and during the training process. The training time safety issue, also called safe exploration (i.e., reducing the number of constraint violations during learning), is thought to be challenging and significant, especially when the dynamics of the environment are unknown. In this paper, we not only focus on the safe RL problem subject to certain constraints after convergence, but also take a step towards safe exploration.

Dongjie Yu

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Dongjie Yu received the B.E. and M.S. degrees from Tsinghua University, Beijing, China, in 2020 and 2023, respectively.

His current research interests include safe reinforcement learning and its application in the decision-making, control of robotics, and autonomous driving.

Dongjie Yu received the B.E. and M.S. degrees from Tsinghua University, Beijing, China, in 2020 and 2023, respectively.

His current research interests include safe reinforcement learning and its application in the decision-making, control of robotics, and autonomous driving.View more

Wenjun Zou

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Wenjun Zou received the B.E. degree in automotive engineering from Tsinghua University, Beijing, China, in 2020, where he is currently pursuing the Ph.D. degree in mechanical engineering.

His current research interests include decision and control of autonomous vehicles and reinforcement learning.

Wenjun Zou received the B.E. degree in automotive engineering from Tsinghua University, Beijing, China, in 2020, where he is currently pursuing the Ph.D. degree in mechanical engineering.

His current research interests include decision and control of autonomous vehicles and reinforcement learning.View more

Yujie Yang

School of Vehicle and Mobility, Tsinghua University, Beijing, China

His research interests include decision and control of autonomous vehicles, reinforcement learning, and optimal control.

His research interests include decision and control of autonomous vehicles, reinforcement learning, and optimal control.View more

Haitong Ma

John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA

His research interest lies in the intersection of control theory and machine learning. He received the Outstanding Master’s Graduate and Master’s Thesis of Tsinghua University, the L4DC Best Paper Award Finalists, the ITSC Best Student Paper Award, and the Championship of Honda Eco Mileage Challenge in China.

Shengbo Eben Li

School of Vehicle and Mobility, Tsinghua University, Beijing, China

He was a recipient of the Best Paper Award at IEEE ITS Symposium in 2014, the Best Paper Award in 14th ITS Asia–Pacific Forum, the National Award for Technological Invention in China in 2013, the Excellent Young Scholar of NSF China in 2016, and the Young Professorship of Changjiang Scholar Program in 2016. He serves as an Associate Editor for IEEE Intelligent Transportation Systems Magazine and the IEEE Transactions on Intelligent Transportation Systems.

Yuming Yin

College of Mechanical Engineering, Zhejiang University of Technology, Zhejiang, Hangzhou, China

Yuming Yin received the M.S. degree in vehicle engineering from the University of Science and Technology Beijing in 2013 and the Ph.D. degree in mechanical engineering from Concordia University, Canada, in 2017. He is currently an Associate Professor with the School of Mechanical Engineering, Zhejiang University of Technology; and a Visiting Research Fellow with the School of Vehicle and Mobility, Tsinghua University. He is the author of about 30 peer-reviewed journals/conference papers and a PI/a co-PI of the national and state projects. His active research interests include ground vehicle system dynamics, marginal emergence control, and model-data mixed reinforcement learning.

Jianyu Chen

Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China

Jingliang Duan

School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China

Jingliang Duan received the Ph.D. degree from the School of Vehicle and Mobility, Tsinghua University, China, in 2021.

He studied as a Visiting Student Researcher with the Department of Mechanical Engineering, University of California at Berkeley, in 2019; and a Research Fellow with the Department of Electrical and Computer Engineering, National University of Singapore, from 2021 to 2022. He is currently an Associate Professor with the School of Mechanical Engineering, University of Science and Technology Beijing, China. His research interests include reinforcement learning, optimal control, and self-driving decision-making.

Jingliang Duan received the Ph.D. degree from the School of Vehicle and Mobility, Tsinghua University, China, in 2021.

References is not available for this document.

Safe Model-Based Reinforcement Learning With an Uncertainty-Aware Reachability Certificate

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Safe Model-Based Reinforcement Learning With an Uncertainty-Aware Reachability Certificate

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?