Skip to Main Content
This paper presents a modified, distributed Q-learning algorithm, termed as sequential Q-learning with Kalman filtering (SQKF), for decision making associated with multirobot cooperation. The SQKF algorithm developed here has the following characteristics. 1) The learning process is arranged in a sequential manner (i.e., the robots will not make decisions simultaneously, but in a predefined sequence) so as to promote cooperation among robots and reduce their Q-learning spaces. 2) A robot will not update its Q-values with observed global rewards. Instead, it will employ a specific Kalman filter to extract its real local reward from the global reward, thereby updating its Q-table with this local reward. The new SQKF algorithm is intended to solve two problems in multirobot Q-learning: credit assignment and behavior conflicts. The detailed procedure of the SQKF algorithm is presented, and its application is illustrated using a prototype multirobot experimental system. The experimental results show that the algorithm has better performance than the conventional single-agent Q-learning algorithm or the team Q-learning algorithm in the multirobot domain.