By Dar'ya Guarnera; Giuseppe Claudio Guarnera; Brian A. Barsky
Morgan & Claypool Books
In order to remain viable and to reproduce an animal has to continuously deal with the problem of choosing the right behavior among several others (e.g. obtaining food, obtaining water, avoiding predators, ...) at the right time. In robotics this problem arises when we want to synthesize a complex behavior from elementary behaviors. Within the reinforcement learning framework we review the behaviors coordination methods proposed so far. Then we discuss their limitations and propose a new coordination method based on the restless bandits theory. Restless bandits allocation indexes are an extension of the Gittins indexes and are borrowed from the field of optimal scheduling. They concern problems involving the sharing of limited resources between several projects which are being pursued. The performance of the proposed method is illustrated through the postman robot problem and compared to the Hierarchical Q-learning (Lin, 1993).