Skip to Main Content
In this paper, data association method using audio and video data together to localize targets in a cluttered environment and detect who is speaking to the robot is presented. Particle filter is applied to find an optimal association between target and measurements efficiently. State variables are composed of position and speaking status. To update the speaking state, we first evaluate the upcoming sound signal based on the cross-correlation between microphones and then calculate the likelihood for audio information. Visual measurement is applied to find an optimal association between target and visual observations. The number of targets that robot should interact is updated based on the vision and audio information. Several experimental data were collected beforehand and simulated on computer to verify the performance of proposed data association method for speaker selection problem in a cluttered environment.