Abstract:
Recently, the mutual combination of Neural Radiance Fields (NeRF) and Segment Anything Model (SAM) in 3D semantic segmentation has achieved impressive results. However, t...Show MoreMetadata
Abstract:
Recently, the mutual combination of Neural Radiance Fields (NeRF) and Segment Anything Model (SAM) in 3D semantic segmentation has achieved impressive results. However, they face the challenge of accurately and consistently segmenting objects in complex scenarios. To address this issue, we introduce the Enhancing Segmentation in NeRF with CLIP(ES-NeRF), which aims to improve the segmentation quality through feature fusion with the help of CLIP’s powerful semantic comprehension. Specifically, we propose a CLIP2SAM module, which utilizes the image-text features extracted by CLIP for cross-modal multi-scale interactions to obtain the semantic features of CLIP on rough segmentation. These features will then be aligned with those extracted by SAM to achieve feature fusion and complete segmentation. Finally, NeRF is employed to aggregate masks from disparate viewpoints, thereby attaining high-quality 3D segmentation. The efficacy of our method is substantiated by a multitude of experimental results, demonstrating its superiority over existing techniques.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: