Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks | IEEE Conference Publication | IEEE Xplore

Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks


Abstract:

Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article pr...Show More

Abstract:

Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.
Date of Conference: 25-29 March 2023
Date Added to IEEE Xplore: 01 May 2023
ISBN Information:
Conference Location: Shanghai, China

Contact IEEE to Subscribe

References

References is not available for this document.