Abstract:
An important aspect of audiovisual speaker localization is the appropriate fusion of acoustic and visual observations based on their time-varying reliability. In this stu...Show MoreMetadata
Abstract:
An important aspect of audiovisual speaker localization is the appropriate fusion of acoustic and visual observations based on their time-varying reliability. In this study, a framework which incorporates dynamic stream weights into the well-known Kalman filtering framework is proposed to cope with this challenge. The concept of dynamic stream weights has recently been investigated in the context of audiovisual automatic speech recognition, where it was successfully applied to weight audiovisual observations according to their reliability. This study extends that approach to linear dynamical systems and additionally introduces a closed-form solution to compute oracle dynamic stream weights from observation sequences with known state trajectories. The proposed approach is evaluated on audiovisual recordings from a humanoid robot in reverberant environments. The results indicate that incorporating dynamic stream weights allows for efficient data fusion on a per-frame basis, which shows superior performance over conventional Kalman-filter-based state estimation.
Date of Conference: 17-20 September 2018
Date Added to IEEE Xplore: 04 November 2018
ISBN Information: