By Topic

Probabilistic integration of audiovisual information to localize sound source in human-robot interaction

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Chen, B. ; Dept. of Electron. Eng., Univ. of Electro-Commun., Tokyo, Japan ; Meguro, M. ; Kaneko, M.

This paper proposes a method to estimate a sound source position by fusing the auditory and visual information with Bayesian network in human-robot interaction. We firstly integrate multi-channel audio signals and a depth image about the environment to generate a likelihood map for sound source localization. However, this integration, denoted by "MICs", does not always lead to locate a sound source correctly. For correcting the failure in localization, we integrate the likelihood values generated from "MICs" and the skin-color distribution in an image according to the result of classifying audio signal into speech/non-speech categories. The audio classifier is based on the support vector machine(SVM) and the skin-color distribution is modeled with GMM. With the evidences given by MICs, SVMs and GMM, we infer whether pixels in images correspond to sound source or not according to the trained Bayesian network. Finally, experimental results are presented to show the effectiveness of the proposed method.

Published in:

Robot and Human Interactive Communication, 2003. Proceedings. ROMAN 2003. The 12th IEEE International Workshop on

Date of Conference:

31 Oct.-2 Nov. 2003