By Topic

Speech-to-video synthesis using facial animation parameters

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Aleksic, P.S. ; Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA ; Katsaggelos, A.K.

The presence of visual information in addition to audio could improve speech understanding in noisy environments. This additional information could be especially useful for people with impaired hearing who are able to speechread. This paper focuses on the problem of synthesizing the facial animation parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, from a narrowband acoustic speech (telephone) signal. A correlation hidden Markov model (CHMM) system for performing visual speech synthesis is proposed. The CHMM system integrates an independently trained acoustic HMM (AHMM) system and a visual HMM (VHMM) system, in order to realize speech-to-video synthesis. Analyzing the synthesized FAPs and computing the time alignment errors perform objective experiments. Time alignment errors are reduced by 40.5% compared to the conventional temporal scaling method.

Published in:

Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on  (Volume:3 )

Date of Conference:

14-17 Sept. 2003