Skip to Main Content
Character speech animation is traditionally considered as important but tedious work, especially when taking lip synchronization (lip-sync) into consideration. Although there are some methods proposed to ease the burden on artists to create facial and speech animation, almost none is fast and efficient. In this paper, we introduce a framework for synthesizing lip-sync character speech animation in real time from a given speech sequence and its corresponding texts, starting from training dominated animeme models (DAMs) for each kind of phoneme by learning the character's animation control signal through an expectation-maximization (EM)-style optimization approach. The DAMs are further decomposed to polynomial-fitted animeme models and corresponding dominance functions while taking coarticulation into account. Finally, given a novel speech sequence and its corresponding texts, the animation control signal of the character can be synthesized in real time with the trained DAMs. The synthesized lip-sync animation can even preserve exaggerated characteristics of the character's facial geometry. Moreover, since our method can perform in real time, it can be used for many applications, such as lip-sync animation prototyping, multilingual animation reproduction, avatar speech, and mass animation production. Furthermore, the synthesized animation control signal can be imported into 3-D packages for further adjustment, so our method can be easily integrated into the existing production pipeline.