Skip to Main Content
A new audio-visual speech synthesis approach is proposed based on Chinese visual triphone. Chinese visual triphone model is constructed using a new clustering method combining artificial immune system and FCM. In the analysis stage, with the training phonetic transcription, visual triphone segments are selected from video sequence, and corresponding lip feature vectors are extracted. In the synthesis stage, viterbi search algorithm is used to select the best visual triphone segments by finding out a path which produces the minimum cost. According to the concatenation principles, mouth animation is generated and stitched into background video. Experimental results show that the synthesized video is natural-looking and satisfactory.