Conferences >2005 International Conference...

Lip movement synthesis in audio-visual speech recognition system

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper describes a technique for synthesizing audio-visual speech based on HMM. The experiment "Lip movement synthesis" use VC++ and HTK toolkit. In the training stag...Show More

Metadata

Abstract:

This paper describes a technique for synthesizing audio-visual speech based on HMM. The experiment "Lip movement synthesis" use VC++ and HTK toolkit. In the training stage, Japanese words will be change to text from the image sequence from using hidden Markov model (HMM). Experimental results show that the synthetic lip image sequence is smooth and realistic. This research was supported by NSFC, China (Grant No.60374032).

Published in: 2005 International Conference on Natural Language Processing and Knowledge Engineering

Date of Conference: 30 October 2005 - 01 November 2005

Date Added to IEEE Xplore: 27 February 2006

Print ISBN:0-7803-9361-9

DOI: 10.1109/NLPKE.2005.1598781

Conference Location: Wuhan, China

References is not available for this document.

Contents

I. Introduction

Speech recognition technology research is mainly focusing on the voice of human. The dramatic increase in direct person-to-person communication triggered by international trade and human migration since the 1970s motivated basic research activities in spoken language technology. Furthermore, information technologies, such as the Internet, broadband networks, and the popularization of powerful personal computers, have been increasing our ability to access documents written in foreign languages and have face-to-face conversations between persons with different mother tongues from the end of the 1990s and into the 2000s. In order to enable smooth verbal communication between people of different languages, it is necessary to translate their utterances based on an understanding of the intentions of the speakers, their cultural backgrounds, and the context of the dialog. Our ultimate goal is to develop a speech recognition control system to deal with all of these features. An audio-visual speech recognition control system [1]–[5] will be developed by Mel Frequency Cepstral Coefficient and image-based method in order to speak independence.

Select All

D.R. Hill, A.Pearce, B.Wyvill, "Animating speech: an automated approach using speech synthesized by rule," The visual computer,3, pp. 277-289,1998.

CrossRef Google Scholar

K.Waters and T.M.Levergood, "DECfacc: an automatic Lip-Synchronization algorithm for synthetic faces," Technical Report CRL 93/4,DEC Cambridge research laboratory, Cambridge, MA, Sep, 1993.

Google Scholar

N.M.Brooke nd S.D.Scott, "Copmuter graphics animaltions of talking faces based on stochastic models,"Proc.IEEE ISSIPNN, pp.73-76, Apr 1994.

View Article

Google Scholar

B.Goff and C.Benoi. "A text to audiovisual speech synthesizer for French,"Proc.ICSLP96, PP.2163-2166, Oct. 1996.

View Article

Google Scholar

J.Beskow,K,Elenius,S.McGlashan, "Olga-A Dialogue system with an animated talking agent,",Proc.Eurospeech97,pp.1651-1654,Sep.1997.

J. Luettin, N. Thacker, S Beet, "Speech reading using Shape and Intensity Information," Proc. ICSLP, pp. 58-61, 1996.

View Article

Google Scholar

J. Luettin, "Towards Speaker Independent continuous Speech reading," Proc. Euro speech, pp. 1991-1994, 1997.

Google Scholar

G. Potamianos and A. Potamianos, "Speaker adaptation for Audio-Visual Speech Recognition," Proc.Eurospeech, pp. 1291-1294, 1999.

CrossRef Google Scholar

J. R. Movellan, "Visual Speech Recognition with Stochastic Networks," G. Tesauro, D. Touretzky, T.Leen (eds.), Advances in Neural Information Processing Systems 7, MIT Press Cambridge, 1995.

Google Scholar

10.

O.Vanegas, A. Tanaka, K. Tokuda and T. Kitamura, "HMM-based Visual Speech Recognition using Intensity and Location Normalization," Proc. ICSLP, pp.289-292, 1998.

CrossRef Google Scholar

11.

Y.Nankaku, K.Tokuda and T. Kitamura, "Intensityand Location-Normalized Training for HMM-Based Visual Speech Recognition," Proc. Euro speech, pp. 1287-1290, 1999.

Google Scholar

References is not available for this document.

Lip movement synthesis in audio-visual speech recognition system

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Lip movement synthesis in audio-visual speech recognition system

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?