Skip to Main Content
This paper presents a modified R-learning according to the traditional average reward reinforcement learning algorithm. Reinforcement learning problems constitute an important class of learning and control problems faced by artificial intelligence systems. The general framework of reinforcement learning can be divided into two forms, discounted reward reinforcement learning and average reward reinforcement learning. R-learning is a model-free average reward reinforcement learning algorithm. Comparing with the conventional R-learning algorithm, this paper undertakes a detailed examination of the improvement of the R-learning, by adding the directing reward function with punitive mechanism and the exploration strategy based on the roulette technique. As the result of this design, agent can gain more information in every learning step. Through applying the improved R-learning to Robocup simulation league (2D) and making comparison with the Q-learning, empirical results show that the learning efficiency has been increased.