I. Introduction
Recently, reinforcement learning (RL) has been widely investigated for autonomous systems including autonomous driving [1], [2], drones [3], [4], and robots [5], [6]. The main reason for the recent interest in reinforcement learning is the impressive increase of its decision-making performance, driven by deep reinforcement learning (DRL). DRL adopts deep neural networks (DNNs) for approximate high-complexity environments to overcome the low control performance due to the limited state dimension of the classical RL algorithms. DRL has achieved human-level or even better control performance in a lot of complex environments. [5] trained a four-legged walking robot through DRL. The trained robot could adapt to sudden environment changes including slope variations and new obstacles. [6] showed a human-level performance robotic curling team winning 3 out of 4 expert teams in an actual curling match. In the recent DRAPA air combat evolution program, the agents trained with state-of-the-art DRL algorithms such as SAC [7], TD3 [8], and PPO [9] beat human pilots.