Recent attempts to let monolithic reinforcement-learning agents synthesize coordinated behavior scale poorly to more complicated multi-agent learning problems where multiple learning agents play different roles and work together for the accomplishment of their common goal. These learning agents have to receive and respond to various sensory information from their partners as well as that from the physical environment itself. Hence, their state spaces are subject to grow exponentially in the number of the partners. As an illustrative problem suffered from this kind of combinatorial explosion, we consider a modified version of the pursuit problem, and show how successfully a collection of modular Q-learning hunter agents synthesize coordinated decision policies needed to capture a randomly-fleeing prey agent effectively, by specializing their functionality and acquiring herding behavior
Published in:
Intelligent Robots and Systems '96, IROS 96, Proceedings of the 1996 IEEE/RSJ International Conference on
(Volume:3
)
Date of Conference: 4-8 Nov 1996