CA-Wav2Lip: Coordinate Attention-based Speech To Lip Synthesis In The Wild | IEEE Conference Publication | IEEE Xplore

CA-Wav2Lip: Coordinate Attention-based Speech To Lip Synthesis In The Wild


Abstract:

With the growing consumption of online visual contents, there is an urgent need for video translation in order to reach a wider audience from around the world. However, t...Show More

Abstract:

With the growing consumption of online visual contents, there is an urgent need for video translation in order to reach a wider audience from around the world. However, the materials after direct translation and dubbing are unable to create a natural audio-visual experience since the translated speech and lip movement are often out of sync. To improve the viewing experience, an accurate automatic lip-movement synchronization generation system is necessary. To improve the accuracy and visual quality of speech to lip generation, this research proposes two techniques: Embedding Attention Mechanisms in Convolution Layers and Deploying SSIM as Loss Function in Visual Quality Discriminator. The proposed system as well as several other ones are tested on three audiovisual datasets. The results show that our proposed methods achieve superior performance over the state-of-the-art speech to lip synthesis on not only the accuracy but also the visual quality of audio-lip synchronization generation.
Date of Conference: 26-30 June 2023
Date Added to IEEE Xplore: 07 August 2023
ISBN Information:

ISSN Information:

Conference Location: Nashville, TN, USA

Contact IEEE to Subscribe

References

References is not available for this document.