Skip to Main Content
In this paper, we propose a novel system to estimate depth maps of outdoor scenes from a video sequence. According to the characteristics of a video, our approach considers more information in the temporal domain than the traditional depth reconstruction methods. We perform Structure From Motion (SfM) on consecutive image frames from a video from SIFT feature point correspondences, which provides some camera information, including 3D translation and rotation, for all the images. Then, we compute the constrained optical flow between selected scenes so that we can solve an over-constrained linear system to estimate the depth map for all pixels at each frame. In addition, mean shift image segmentation is incorporated to aggregate the depth estimation. Thus, this initial depth map is used as the data term of our pixel-based and region-based Markov Random Field (MRF) formulation for depth map estimation. The proposed MRF depth estimation not only imposes adaptive smoothness constraints but also includes sky detection in the final depth map estimation. By minimizing the associated MRF energy function for each frame, we obtain refined depth maps that achieve detail-preserving and temporally consistent depth estimation results.