By Topic

Audio-Visual Speech Synthesis Based on Chinese Visual Triphone

Sign In

Full text access may be available.

To access full text, please use your member or institutional sign in.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Hui Zhao ; Coll. of Electron. Sci. & Eng., Nat. Univ. of Defense Technol., Changsha, China ; Yue-bing Chen ; Ya-min Shen ; Chao-jing Tang

A new audio-visual speech synthesis approach is proposed based on Chinese visual triphone. Chinese visual triphone model is constructed using a new clustering method combining artificial immune system and FCM. In the analysis stage, with the training phonetic transcription, visual triphone segments are selected from video sequence, and corresponding lip feature vectors are extracted. In the synthesis stage, viterbi search algorithm is used to select the best visual triphone segments by finding out a path which produces the minimum cost. According to the concatenation principles, mouth animation is generated and stitched into background video. Experimental results show that the synthesized video is natural-looking and satisfactory.

Published in:

Image and Signal Processing, 2009. CISP '09. 2nd International Congress on

Date of Conference:

17-19 Oct. 2009