V-Fusion: 2D Detection-enhanced Multimodal 3D BEV Object Detection | IEEE Conference Publication | IEEE Xplore

V-Fusion: 2D Detection-enhanced Multimodal 3D BEV Object Detection


Abstract:

Integrating information from multiple sensors enhances the performance of autonomous vehicle perception systems. However, current multimodal 3D object detection methods f...Show More

Abstract:

Integrating information from multiple sensors enhances the performance of autonomous vehicle perception systems. However, current multimodal 3D object detection methods focus on unifying modalities into a bird’s-eye view (BEV) representation, which overlooks the inherent characteristics of camera perspective view (PV), where 2D detection performance significantly surpasses that of state-of-the-art 3D detectors. In this paper, we propose V-Fusion, a high-quality 2D detection-enhanced multimodal BEV object detection method. By leveraging the 2D priors of PV, we construct 3D query proposals that complement BEV 3D queries. To address the modal discrepancy in generating 3D queries from 2D priors, we propose a depth-robust 2D-to-3D query generation strategy. Additionally, we introduce a novel geometry-constrained self-attention mechanism to enhance the interaction of BEV 3D queries and employ an additional set of learnable 3D queries to account for potentially missed objects. Notably, V-Fusion achieves 74.1 NDS performance on the challenging nuScenes dataset, outperforming SparseFusion in 1.0 NDS and offering comparable inference speed.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Contact IEEE to Subscribe

References

References is not available for this document.