Skip to Main Content
Bayesian networks are statistical models that extend the framework of hidden Markov models (HMM) and allow for the analysis of multi modal signals such as audio-visual speech. Our recent results demonstrate the use of coupled HMM in audio-visual speech recognition and speaker identification. The increased performance of this model is due to its low complexity and its ability to describe both the audio-visual state asynchrony and natural dependency over time. The audio-visual speaker identification accuracy is enhanced in a late decision approach that integrates the audio-visual speech likelihood and the face likelihood computed using an embedded Bayesian network.