Skip to Main Content
We introduce a computational model of sensor fusion based on the topographic representations of a "two-microphone and one camera" configuration. Our aim is to perform a robust multimodal attention-mechanism in artificial systems. In our approach, we consider neurophysiological findings to discuss the biological plausibility of the coding and extraction of spatial features, but also meet the demands and constraints of applications in the field of human-robot interaction. In contrast to the common technique of processing different modalities separately and finally combine multiple localization hypotheses, we integrate auditory and visual data on an early level. This can be considered as focusing the attention or controlling the gaze onto salient objects. Our computational model is inspired by findings about the inferior colliculus in the auditory pathway and the visual and multimodal sections of the superior colliculus. Accordingly it includes: a) an auditory map, based on interaural time delays, b) a visual map, based on spatio-temporal intensity difference and c) a bimodal map where multisensory response enhancement is performed and motor-commands can be derived. After introducing a modified Amari-neural field architecture in the bimodal model, we place emphasis on a novel method of evaluation and parameter-optimization based on biology-inspired specifications and real-world experiments.
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on (Volume:4 )
Date of Conference: 25-29 July 2004