Skip to Main Content
In ordinary reinforcement learning methods, a single agent learns to achieve a goal through many episodes. Since the agent essentially learns by trial and error, it takes much computation time to acquire an optimal policy especially for complicated learning problems. Meanwhile, for optimization problems, population-based methods such as particle swarm optimization have been recognized that they are able to find rapidly the global optimal solution for multi-modal functions with wide solution space. We recently proposed swarm reinforcement learning methods in which multiple agents are prepared and they learn through not only their respective experiences but also exchanging information among them. In these methods, it is important how to design a method of exchanging the information. In this paper, we propose a swarm reinforcement learning method based on ant colony optimization, which is an optimization method inspired from behavior of real ants using trail pheromones, in order to acquire the optimal policy rapidly even for complicated reinforcement learning problems. In the proposed method, the agents exchange their information through Pheromone-Q values which we define so as to make them play the same role as the trail pheromones. The proposed method is applied to shortest path problems, and its performance is demonstrated through numerical experiments.