By Topic

Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Hao Tang ; Univ. of Illinois at Urbana-Champaign, Champaign, IL ; Yuxiao Hu ; Yun Fu ; Hasegawa-Johnson, M.
more authors

In this paper, we propose a complete pipeline of efficient and low-cost techniques to construct a realistic 3D text-driven emotive audio-visual avatar from a single 2D frontal-view face image of any person on the fly. This real-time conversion is achieved through three steps. First, a personalized 3D face model is built based on the 2D face image using a fully automatic 3D face shape and texture reconstruction framework. Second, using standard MPEG-4 FAPs (Facial Animation Parameters), the face model is animated by the Viseme and expression channels and is complemented by the visual prosody channel that controls head, eye and eyelid movements. Finally, the facial animation is combined and synchronized with the emotive synthetic speech generated by incorporating an emotion transformer into a Festival-MBROLA text to neutral speech synthesizer.

Published in:

Multimedia and Expo, 2008 IEEE International Conference on

Date of Conference:

June 23 2008-April 26 2008