Real-time lip tracking and bimodal continuous speech recognition
Chan, M.T.; Zhang, Y.; Huang, T.S.
Multimedia Signal Processing, 1998 IEEE Second Workshop on
Volume , Issue , 7-9 Dec 1998 Page(s):65 - 70
Digital Object Identifier 10.1109/MMSP.1998.738914
Summary:We investigate using a bimodal approach to speech recognition by
incorporating additional visual features derived from lip movement of
the speaker. A reference contour model is used to track the lip outline
of the speaker. By using color, constraining the deformation in an
affine subspace, and by incorporating an outlier rejection mechanism,
our system is robust and runs in real time. To address the model
initialization issue, a fast lip localization algorithm is also
incorporated. A sample of continuous bimodal speech data based on a
confined vocabulary (useful for our application area) was synchronously
captured for training and testing. Using the hidden Markov modeling
framework, we trained our bimodal context-dependent sub-word-based
recognizer in a few different ways. The experiments show that the
bimodal recognizer compares favorably to the acoustic-only counterpart.
The results also indicate that it is advantageous to include first
derivatives of the visual features. Furthermore, the 2-stream modeling
scheme appears to be preferable to the 1-stream case for bimodal speech
View citation and abstract |