Impact Statement:Reinforcement learning (RL) has demonstrated impressive performance in many complex games such as Go and StarCraft. However, the existing RL algorithms suffer from prohib...Show More
Abstract:
One of the most critical challenges in deep reinforcement learning is to maintain the long-term exploration capability of the agent. To tackle this problem, it has been r...Show MoreMetadata
Impact Statement:
Reinforcement learning (RL) has demonstrated impressive performance in many complex games such as Go and StarCraft. However, the existing RL algorithms suffer from prohibitively expensive computational complexity, poor generalization ability, and low robustness, which hinders its practical applications in the real world. Thus, it is essential to develop more effective RL algorithms for real-life applications such as autonomous driving and smart manufacturing. To tackle this problem, one critical design challenge is to improve the exploration mechanism of RL to realize efficient policy learning. This work proposes a simple and yet, effective method that can significantly improve the exploration ability of RL algorithms, which can be easily applied to real-life applications. For instance, it will facilitate the development of more powerful autonomous driving systems that can adapt to more complex and challenging environments. Finally, this work is also expected to inspire more subsequent...
Abstract:
One of the most critical challenges in deep reinforcement learning is to maintain the long-term exploration capability of the agent. To tackle this problem, it has been recently proposed to provide intrinsic rewards for the agent to encourage exploration. However, most existing intrinsic reward-based methods proposed in the literature fail to provide sustainable exploration incentives, a problem known as vanishing rewards. In addition, these conventional methods incur complex models and additional memory in their learning procedures, resulting in high computational complexity and low robustness. In this work, a novel intrinsic reward module based on the Rényi entropy is proposed to provide high-quality intrinsic rewards. It is shown that the proposed method actually generalizes the existing state entropy maximization methods. In particular, a k-nearest neighbor estimator is introduced for entropy estimation while a k-value search method is designed to guarantee the estimation accur...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 4, Issue: 5, October 2023)