Reusable Architecture Growth for Continual Stereo Matching | IEEE Journals & Magazine | IEEE Xplore

Reusable Architecture Growth for Continual Stereo Matching


Abstract:

The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to ...Show More

Abstract:

The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 46, Issue: 9, September 2024)
Page(s): 6167 - 6184
Date of Publication: 19 March 2024

ISSN Information:

PubMed ID: 38502627

Funding Agency:


I. Introduction

Depth serves as a realistic prerequisite for sensing the surrounding 3D scene structure in many high-level 3D vision tasks [1], [2], [3]. Image-based passive depth estimation approaches compare favorably to active sensors in terms of cost, working range, and flexibility. Among these methods, the well-posed stereo vision is preferentially chosen due to its straightforward settings, excellent accuracy, and reasonable cost.

Contact IEEE to Subscribe

References

References is not available for this document.