By Topic

Audio-visual continuous speech recognition using MPEG-4 compliant visual features

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
P. S. Aleksic ; Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA ; J. J. Williams ; Zhilin Wu ; A. K. Katsaggelos

We utilize facial animation parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, in order to improve automatic speech recognition (ASR) significantly. We describe a robust and automatic algorithm for extraction of FAPs from visual data that requires no hand labeling or extensive training procedures. Multi-stream hidden Markov models (HMM) are used to integrate audio and visual information. ASR experiments are performed under both clean and noisy audio conditions using a relatively large vocabulary (approximately 1000 words). The proposed system reduces the word error rate (WER) by 20% to 23% relative to audio-only ASR WERs, at various SNRs with additive white Gaussian noise, and by 19% relative to the audio-only ASR WER under clean audio conditions.

Published in:

Image Processing. 2002. Proceedings. 2002 International Conference on  (Volume:1 )

Date of Conference: