Lip movement synthesis in audio-visual speech recognition system | IEEE Conference Publication | IEEE Xplore

Lip movement synthesis in audio-visual speech recognition system


Abstract:

This paper describes a technique for synthesizing audio-visual speech based on HMM. The experiment "Lip movement synthesis" use VC++ and HTK toolkit. In the training stag...Show More

Abstract:

This paper describes a technique for synthesizing audio-visual speech based on HMM. The experiment "Lip movement synthesis" use VC++ and HTK toolkit. In the training stage, Japanese words will be change to text from the image sequence from using hidden Markov model (HMM). Experimental results show that the synthetic lip image sequence is smooth and realistic. This research was supported by NSFC, China (Grant No.60374032).
Date of Conference: 30 October 2005 - 01 November 2005
Date Added to IEEE Xplore: 27 February 2006
Print ISBN:0-7803-9361-9
Conference Location: Wuhan, China
References is not available for this document.

I. Introduction

Speech recognition technology research is mainly focusing on the voice of human. The dramatic increase in direct person-to-person communication triggered by international trade and human migration since the 1970s motivated basic research activities in spoken language technology. Furthermore, information technologies, such as the Internet, broadband networks, and the popularization of powerful personal computers, have been increasing our ability to access documents written in foreign languages and have face-to-face conversations between persons with different mother tongues from the end of the 1990s and into the 2000s. In order to enable smooth verbal communication between people of different languages, it is necessary to translate their utterances based on an understanding of the intentions of the speakers, their cultural backgrounds, and the context of the dialog. Our ultimate goal is to develop a speech recognition control system to deal with all of these features. An audio-visual speech recognition control system [1]–[5] will be developed by Mel Frequency Cepstral Coefficient and image-based method in order to speak independence.

Select All
1.
D.R. Hill, A.Pearce, B.Wyvill, "Animating speech: an automated approach using speech synthesized by rule," The visual computer,3, pp. 277-289,1998.
2.
K.Waters and T.M.Levergood, "DECfacc: an automatic Lip-Synchronization algorithm for synthetic faces," Technical Report CRL 93/4,DEC Cambridge research laboratory, Cambridge, MA, Sep, 1993.
3.
N.M.Brooke nd S.D.Scott, "Copmuter graphics animaltions of talking faces based on stochastic models,"Proc.IEEE ISSIPNN, pp.73-76, Apr 1994.
4.
B.Goff and C.Benoi. "A text to audiovisual speech synthesizer for French,"Proc.ICSLP96, PP.2163-2166, Oct. 1996.
5.
J.Beskow,K,Elenius,S.McGlashan, "Olga-A Dialogue system with an animated talking agent,",Proc.Eurospeech97,pp.1651-1654,Sep.1997.
6.
J. Luettin, N. Thacker, S Beet, "Speech reading using Shape and Intensity Information," Proc. ICSLP, pp. 58-61, 1996.
7.
J. Luettin, "Towards Speaker Independent continuous Speech reading," Proc. Euro speech, pp. 1991-1994, 1997.
8.
G. Potamianos and A. Potamianos, "Speaker adaptation for Audio-Visual Speech Recognition," Proc.Eurospeech, pp. 1291-1294, 1999.
9.
J. R. Movellan, "Visual Speech Recognition with Stochastic Networks," G. Tesauro, D. Touretzky, T.Leen (eds.), Advances in Neural Information Processing Systems 7, MIT Press Cambridge, 1995.
10.
O.Vanegas, A. Tanaka, K. Tokuda and T. Kitamura, "HMM-based Visual Speech Recognition using Intensity and Location Normalization," Proc. ICSLP, pp.289-292, 1998.
11.
Y.Nankaku, K.Tokuda and T. Kitamura, "Intensityand Location-Normalized Training for HMM-Based Visual Speech Recognition," Proc. Euro speech, pp. 1287-1290, 1999.

Contact IEEE to Subscribe

References

References is not available for this document.