Loading [a11y]/accessibility-menu.js
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior | IEEE Conference Publication | IEEE Xplore

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior


Abstract:

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of ...Show More

Abstract:

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of audio-visual data. Existing works typically formulate the cross-modal mapping into a regression task, which suffers from the regression-to-mean problem leading to over-smoothed facial motions. In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty. The codebook is learned by self-reconstruction over real facial motions and thus embedded with realistic facial motion priors. Over the discrete motion space, a temporal autoregressive model is employed to sequentially synthesize facial motions from the input speech signal, which guarantees lip-sync as well as plausible facial expressions. We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. Also, a user study further justifies our superiority in perceptual quality. Code and video demo are available at https://doubiiu.github.io/projects/codetalker.
Date of Conference: 17-24 June 2023
Date Added to IEEE Xplore: 22 August 2023
ISBN Information:

ISSN Information:

Conference Location: Vancouver, BC, Canada

1. Introduction

3D facial animation has been an active research topic for decades, as attributed to its broad applications in virtual reality, film production, and games. The high correlation between speech and facial gestures (especially lip movements) makes it possible to drive the facial animation with a speech signal. Early attempts are mainly made to build the complex mapping rules between phonemes and their visual counterpart, which usually have limited performance [53], [63]. With the advances in deep learning, recent speech-driven facial animation techniques push forward the state-of-the-art significantly. However, it still remains challenging to generate human-like motions.

Contact IEEE to Subscribe

References

References is not available for this document.