Conferences >ICASSP 2023 - 2023 IEEE Inter...

Audio-Driven Talking Head Video Generation with Diffusion Model

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Synthesizing high-fidelity talking head videos by fitting input audio sequences is a highly anticipated technique in many applications, such as digital humans, virtual vi...Show More

Metadata

Abstract:

Synthesizing high-fidelity talking head videos by fitting input audio sequences is a highly anticipated technique in many applications, such as digital humans, virtual video conferences, and human-computer interaction. Popular GAN-based methods aim to align speech audio with lip motions and head poses. However, existing methods are prone to training instability and even mode collapse, resulting in low-quality video generation. In this paper, we propose a novel audio-driven diffusion method for generating high-resolution realistic videos of talking heads with the help of the denoising diffusion model. Specifically, the face attribute disentanglement module is proposed to disentangle eye blinking and lip motion features, where the lip motion features are synchronized with audio features via the contrastive learning strategy, and the disentangled motion features are aligned well with the talking head. Furthermore, the denoising diffusion model takes the source image and the warped motion features as input to generate the high-resolution realistic talking head with diverse head poses. Extensive evaluations using multiple metrics demonstrate that our method outperforms the current techniques both qualitatively and quantitatively.

Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 04-10 June 2023

Date Added to IEEE Xplore: 05 May 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49357.2023.10094937

Conference Location: Rhodes Island, Greece

Contents

References is not available for this document.

Audio-Driven Talking Head Video Generation with Diffusion Model

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Audio-Driven Talking Head Video Generation with Diffusion Model

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?