Abstract:
The existing monocular methods face huge challenges in reconstructing credible details of non-visible areas in large pose images. Due to the fact that facial details are ...Show MoreMetadata
Abstract:
The existing monocular methods face huge challenges in reconstructing credible details of non-visible areas in large pose images. Due to the fact that facial details are lost in non-visible areas of large pose images, existing methods lose basis when reconstructing details, resulting in unreliable results. Even if the generative model is used to repair the image first, the reconstruction process is very complex and costly. To this end, we propose an end-to-end and self-supervised RGB to depth method that equates small pose to large pose in UV space to obtain labels for non-visible areas. Then, we infer the depth values of non-visible areas and reconstruct a detailed 3D face. Finally, we render it into a face image and use it alongside the labels for self-supervised training of the network. During inference for large pose image, our method could reconstruct credible details of non-visible areas with a basis, rather than blindly. In addition, coarse and detailed reconstruction have mutually exclusive requirements for training images while coarse reconstruction is often not met, resulting in limited reconstruction accuracy. We propose a mutual exclusion elimination method to solve it, improving its accuracy. Extensive experiments demonstrate that our method could reconstruct precise 3D faces with credible details from large pose images. Our supplementary material is published on: https://github.com/lxy-nxu/ICASSP2025/tree/main
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: