I. Introduction
Classical motion planning algorithms, such as artificial potential field [1], rapidly exploring random trees (RRT) [2], and RRT* [3], have been successfully applied in many fields, including autonomous vehicles and manipulators. Such nonlearning algorithms, however, face difficulties when dealing with the high-dimensional motion planning problem. A recent research trend is to apply machine learning to motion planning, particularly reinforcement learning (RL) [4], which has achieved great advances in the field of robotics, such as robotic manipulators [5]–[7] and autonomous vehicles [8]–[14]. Both in simulation experiments and real-world applications, numerous successful examples have been reported in motion planning of autonomous vehicles based on RL. Most of the abovementioned studies utilize the methods based on deep deterministic policy gradient (DDPG) [15] to train policies. However, the soft actor–critic (SAC) [16] algorithm, which achieves decent performance among the existing RL methods, is rarely utilized in the motion planning of autonomous vehicles.