Skip to Main Content
When adults talk to infants they do that in a different way compared to how they communicate with other adults. This kind of infant directed speech (IDS) typically highlights target words using focal stress and utterance final position. Also, speech directed to infants often refers to objects, people and events in the world surrounding the infant. Because of this, the sound sequences the infant hears are very likely to co-occur with actual objects or events in the infant's visual field. In this work we present a model that is able to learn word-like structures from multimodal information sources without any pre-programmed linguistic knowlege, by taking advantage of the characteristics of IDS. The model is implemented on a humanoid robot platform and is able to extract word-like patterns and associating these to objects in the visual surrounding.