Skip to Main Content
In this paper, we evaluate adaptive sound localization algorithms for robotic heads. To this end we built a 3 degree-of-freedom head with two microphones encased in artificial pinnae (outer ears). The geometry of the head and pinnae induce temporal differences in the sound recorded at each microphone. These differences change with the frequency of the sound, location of the sound, and orientation of the robot in a complex manner. To learn the relationship between these auditory differences and the location of a sound source, we applied machine learning methods to a database of different audio source locations and robot head orientations. Our approach achieves a mean error of 2.5 degrees for azimuth and 11 degrees for elevation for estimating the position of an audio source. The impressive results highlight the benefits of a two-stage regression model to make use of the properties of the artificial pinnae for elevation estimation. In this work, the algorithms were trained using ground truth data provided by a motion capture system. We are currently generalizing the approach so that the training signal is provided online based on a real-time face detection and speech detection system.