Skip to Main Content
We introduce a supervised reinforcement learning (SRL) architecture for robot control problems with high dimensional state spaces. Based on such architecture two new SRL algorithms are proposed. In our algorithms, a behavior model learned from examples is used to dynamically reduce the set of actions available from each state during the early reinforcement learning (RL) process. The creation of such subsets of actions leads the agent to exploit relevant parts of the action space, avoiding the selection of irrelevant actions. Once the agent has exploited the information provided by the behavior model, it keeps improving its value function without any help, by selecting the next actions to be performed from the complete action space. Our experimental work shows clearly how this approach can dramatically speed up the learning process.