Abstract:
The balance between accuracy and computational efficiency is crucial for the applications of deep learning-based stereo matching algorithms in real-world scenarios. Since...Show MoreMetadata
Abstract:
The balance between accuracy and computational efficiency is crucial for the applications of deep learning-based stereo matching algorithms in real-world scenarios. Since matching cost aggregation is usually the most computationally expensive component, a common practice is to construct cost volumes at a low resolution for aggregation and then directly regress a high-resolution disparity map. However, current solutions often suffer from limitations such as the loss of discriminative features caused by downsampling operations that treat all pixels equally, and spatial misalignment resulting from repeated downsampling and upsampling. To overcome these challenges, this paper presents two sampling strategies: the Adaptive Downsampling Module (ADM) and the Disparity Alignment Module (DAM), to prioritize real-time inference while ensuring accuracy. The ADM leverages local features to learn adaptive weights, enabling more effective downsampling while preserving crucial structure information. On the other hand, the DAM employs a learnable interpolation strategy to predict transformation offsets of pixels, thereby mitigating the spatial misalignment issue. Building upon these modules, we introduce ADStereo, a real-time yet accurate network that achieves highly competitive performance on multiple public benchmarks. Specifically, our ADStereo runs over 5\times faster than the current state-of-the-art CREStereo (0.054s vs. 0.29{s} ) under the same hardware while achieving comparable accuracy (1.82% vs. 1.69%) on the KITTI stereo 2015 benchmark. The codes are available at: https://github.com/cocowy1/ADStereo.
Published in: IEEE Transactions on Image Processing ( Volume: 34)