Learnable Cost Metric-Based Multi-View Stereo for Point Cloud Reconstruction | IEEE Journals & Magazine | IEEE Xplore

Learnable Cost Metric-Based Multi-View Stereo for Point Cloud Reconstruction


Abstract:

3-D reconstruction is essential to defect localization. This article proposes LCM-MVSNet, a novel multi-view stereo (MVS) network with learnable cost metric (LCM) for mor...Show More

Abstract:

3-D reconstruction is essential to defect localization. This article proposes LCM-MVSNet, a novel multi-view stereo (MVS) network with learnable cost metric (LCM) for more accurate and complete dense point cloud reconstruction. To adapt to the scene variation and improve the reconstruction quality in non-Lambertian low-textured scenes, we propose LCM to adaptively aggregate multi-view matching similarity into the 3-D cost volume by leveraging sparse point hints. The proposed LCM benefits the MVS approaches in four folds, including depth estimation enhancement, reconstruction quality improvement, memory footprint reduction, and computational burden alleviation, allowing the depth inference for high-resolution images to achieve more accurate and complete reconstruction. In addition, we improve the depth estimation by enhancing the shallow feature propagation via a bottom–up pathway and strengthen the end-to-end supervision by adapting the focal loss to reduce ambiguity caused by sample imbalance. Extensive experiments on three benchmark datasets show that our method achieves state-of-the-art performance on the DTU and BlendedMVS dataset, and exhibits strong generalization ability with a competitive performance on the Tanks and Temples benchmark. Furthermore, we deploy our LCM-MVSNet into our UAV-based infrastructure defect inspection framework for infrastructure reconstruction and defect localization, demonstrating the effectiveness and efficiency of our method. More experiment results can be found in the Appendix at https://github.com/CUHK-USR-Group/TIE_Appendices/blob/main/TIE_Appendix.pdf.
Published in: IEEE Transactions on Industrial Electronics ( Volume: 71, Issue: 9, September 2024)
Page(s): 11519 - 11528
Date of Publication: 14 December 2023

ISSN Information:

Funding Agency:


I. Introduction and Literature Review

Multi-view stereo (MVS) aims to recover the dense 3-D representation of the scene leveraging stereo correspondences as the main cue given calibrated 2-D images from multiple views (more than two views), essentially equivalent to solving the pixel correspondences across multi-view images. Recently, learning-based MVS approaches [1], [2], [3], [4], [5], [6], [7], [8], [9] have significantly outperformed the traditional counterparts in MVS benchmarks [10], [11], [12], [13]. Deep MVS approaches decouple the MVS into a two-stage process: learning-based depth map estimation and depth map filtering and fusion. Compared to the handcrafted photometric measures in traditional approaches, deep MVS approaches encode scene cues, such as reflective priors and illumination changes into the network by adopting powerful feature extraction and cost volume representation to achieve superior reconstruction accuracy and completeness. Despite the superiority of the learning-based MVS approaches, the following improvements can be made to further boost the overall reconstruction quality.

Contact IEEE to Subscribe

References

References is not available for this document.