Skip to Main Content
In this paper, motion estimation is proposed by fusing audio and video sensor data. The audio system consists of three microphones arranged on a Y-shaped structure, mounted on a pan-tilt camera. The camera forms the video system. Together, the audio and video system enables the 3D position of the sound source to be estimated. Using the position estimates, a motion model, consisting of the translational velocity and acceleration of the source, is in turn estimated using a Kalman filter. The motion model allows the sound source to be tracked in real time. This fusion estimation system has many potential applications such as video conferencing and security monitoring for intruders. Simulation results show that the motion estimation is satisfactory.