Safe Reinforcement Learning With Stability Guarantee for Motion Planning of Autonomous Vehicles | IEEE Journals & Magazine | IEEE Xplore

Safe Reinforcement Learning With Stability Guarantee for Motion Planning of Autonomous Vehicles


Abstract:

Reinforcement learning with safety constraints is promising for autonomous vehicles, of which various failures may result in disastrous losses. In general, a safe policy ...Show More

Abstract:

Reinforcement learning with safety constraints is promising for autonomous vehicles, of which various failures may result in disastrous losses. In general, a safe policy is trained by constrained optimization algorithms, in which the average constraint return as a function of states and actions should be lower than a predefined bound. However, most existing safe learning-based algorithms capture states via multiple high-precision sensors, which complicates the hardware systems and is power-consuming. This article is focused on safe motion planning with the stability guarantee for autonomous vehicles with limited size and power. To this end, the risk-identification method and the Lyapunov function are integrated with the well-known soft actor–critic (SAC) algorithm. By borrowing the concept of Lyapunov functions in the control theory, the learned policy can theoretically guarantee that the state trajectory always stays in a safe area. A novel risk-sensitive learning-based algorithm with the stability guarantee is proposed to train policies for the motion planning of autonomous vehicles. The learned policy is implemented on a differential drive vehicle in a simulation environment. The experimental results show that the proposed algorithm achieves a higher success rate than the SAC.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 32, Issue: 12, December 2021)
Page(s): 5435 - 5444
Date of Publication: 09 July 2021

ISSN Information:

PubMed ID: 34242172

Funding Agency:


I. Introduction

Classical motion planning algorithms, such as artificial potential field [1], rapidly exploring random trees (RRT) [2], and RRT* [3], have been successfully applied in many fields, including autonomous vehicles and manipulators. Such nonlearning algorithms, however, face difficulties when dealing with the high-dimensional motion planning problem. A recent research trend is to apply machine learning to motion planning, particularly reinforcement learning (RL) [4], which has achieved great advances in the field of robotics, such as robotic manipulators [5]–[7] and autonomous vehicles [8]–[14]. Both in simulation experiments and real-world applications, numerous successful examples have been reported in motion planning of autonomous vehicles based on RL. Most of the abovementioned studies utilize the methods based on deep deterministic policy gradient (DDPG) [15] to train policies. However, the soft actor–critic (SAC) [16] algorithm, which achieves decent performance among the existing RL methods, is rarely utilized in the motion planning of autonomous vehicles.

Contact IEEE to Subscribe

References

References is not available for this document.