Audio-visual speech modeling using coupled hidden Markov models
Chu, S.M.; Huang, T.S.
Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP apos;02). IEEE International Conference on
Volume 2, Issue , 2002 Page(s):2009 - 2012
Digital Object Identifier
Summary:In this work we consider the bimodal fusion problem in audiovisual
speech recognition. A novel sensory fusion architecture based on the
coupled hidden Markov models (CHMM) is presented. CHMM are directed
graphical models of stochastic processes and are a special type of
dynamic Bayesian networks. The proposed fusion architecture allows us to
address the statistical modeling and the fusion of audio-visual speech
in a unified framework. Furthermore, the architecture is capable of
capturing the asynchronous and temporal inter-modal dependencies between
the two information channels. We describe a model transformation
strategy to facilitate inference and learning in CHMM. Results from
audiovisual speech recognition experiments confirmed the superior
capability of the proposed fusion architecture
View citation and abstract |