Skip to Main Content
This paper proposes a novel method of learning a users preferred reward modalities for human-robot interaction through solving a cooperative training task. A learning algorithm based on a combination of adaptable pre-trained hidden Markov models and a computational model of classical conditioning is outlined. In a training task, where the desired outcome is known by an AIBO pet robot as well as its human instructor, the robot can freely explore human reward behavior. By this method, the robot is able to learn situated, user-specific reward behavior in the different modalities such as gestures, speech and interaction using the robot's built-in sensors. After the training phase, the learned reward behavior can be used as a basis for reinforcement learning of more complex tasks. A preliminary experimental study is presented, which investigates on the effects of restricting possible reward modalities, when teaching a pet robot. The results of the experiments suggest that being able to provide reward freely makes users give more reward compared to a scenario, where reward modalities are restricted. Moreover, the experiments showed that even if a restriction in possible reward modalities is introduced, users tend to give reward that does not conform to the restriction.