Skip to Main Content
This paper studies the impact of interfaces, allowing nonexpert users to efficiently and intuitively teach a robot to recognize new visual objects. We present challenges that need to be addressed for real-world deployment of robots capable of learning new visual objects in interaction with everyday users. We argue that in addition to robust machine learning and computer vision methods, well-designed interfaces are crucial for learning efficiency. In particular, we argue that interfaces can be key in helping nonexpert users to collect good learning examples and, thus, improve the performance of the overall learning system. Then, we present four alternative human-robot interfaces: Three are based on the use of a mediating artifact (smartphone, wiimote, wiimote and laser), and one is based on natural human gestures (with a Wizard-of-Oz recognition system). These interfaces mainly vary in the kind of feedback provided to the user, allowing him to understand more or less easily what the robot is perceiving and, thus, guide his way of providing training examples differently. We then evaluate the impact of these interfaces, in terms of learning efficiency, usability, and user's experience, through a real world and large-scale user study. In this experiment, we asked participants to teach a robot 12 different new visual objects in the context of a robotic game. This game happens in a home-like environment and was designed to motivate and engage users in an interaction where using the system was meaningful. We then discuss results that show significant differences among interfaces. In particular, we show that interfaces such as the smartphone interface allows nonexpert users to intuitively provide much better training examples to the robot, which is almost as good as expert users who are trained for this task and are aware of the different visual perception and machine learning issues. We also show that artifact-mediated teaching is significantly more efficient for robot l- arning, and equally good in terms of usability and user's experience, than teaching thanks to a gesture-based human-like interaction.