Interaction between humans involves a plethora of sensory information, both in the form of explicit communication as well as more subtle unconsciously perceived signals. In order to enable natural human-robot interaction, robots will have to acquire the skills to detect and meaningfully integrate information from multiple modalities. In this article, we focus on sound localization in the context of a multi-sensory humanoid robot that combines audio and video information to yield natural and intuitive responses to human behavior, such as directed eye-head movements towards natural stimuli. We highlight four common sound source localization algorithms and compare their performance and advantages for real-time interaction. We also briefly introduce an integrated distributed control framework called DVC, where additional modalities such as speech recognition, visual tracking, or object recognition can easily be integrated. We further describe the way the sound localization module has been integrated in our humanoid robot, CB.