This paper proposes a real-time design for accurate stereo matching on compute unified device architecture (CUDA). We present a leading local algorithm and then accelerate it by parallel computing. High matching accuracy is achieved by cost aggregation over shape-adaptive support regions and disparity refinement using reliable initial estimates. A novel sample-and-restore scheme is proposed to make the algorithm scalable, capable of attaining several times speedup at the expense of minor accuracy degradation. The refinement and the restoration are jointly realized by a local voting method. To accelerate the voting on CUDA, a graphics processing unit (GPU)-oriented bitwise fast voting method is proposed, faster than the traditional histogram-based approach with two orders of magnitude. The whole algorithm is parallelized on CUDA at a fine granularity, efficiently exploiting the computing resources of GPUs. Our design is among the fastest stereo matching methods on GPUs. Evaluated in the Middlebury stereo benchmark, the proposed design produces the most accurate results among the real-time methods. The advantages of speed, accuracy, and desirable scalability advocate our design for practical applications such as robotics systems and multiview teleconferencing.
Published in:
Circuits and Systems for Video Technology, IEEE Transactions on
(Volume:21
,
Issue:
7
)
Date of Publication: July 2011