Skip to Main Content
Based on inspirations from infant development we present a system which learns associations between acoustic labels and visual representations in interaction with its tutor. The system is integrated with a humanoid robot. Except for a few trigger phrases to start learning all acoustical representations are learned online and in interaction. Similar, for the visual domain the clusters are not predefined and fully learned online. In contrast to other interactive systems the interaction with the acoustic environment is solely based on the two microphones mounted on the robots head. In this paper we give an overview on all key elements of the system and focus on the challenges arising from the headset-free learning of speech labels. In particular we present a mechanism for auditory attention integrating bottom-up and top-down information for the segmentation of the acoustic stream. The performance of the system is evaluated based on offline tests of individual parts of the system and an analysis of the online behavior.
Date of Conference: Sept. 27 2009-Oct. 2 2009