Conferences >2018 IEEE International Confe...

High Quality Depth Estimation from Monocular Images Based on Depth Prediction and Enhancement Sub-Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper addresses the problem of depth estimation from a single RGB image. Previous methods mainly focus on the problems of depth prediction accuracy and output depth ...Show More

Metadata

Abstract:

This paper addresses the problem of depth estimation from a single RGB image. Previous methods mainly focus on the problems of depth prediction accuracy and output depth resolution, but seldom of them can tackle these two problems well. Here, we present a novel depth estimation framework based on deep convolutional neural network (CNN) to learn the mapping between monocular images and depth maps. The proposed architecture can be divided into two components, i.e., depth prediction and depth enhancement sub-networks. We first design a depth prediction network based on the ResNet architecture to infer the scene depth from color image. Then, a depth enhancement network is concatenated to the end of the depth prediction network to obtain a high resolution depth map. Experimental results show that the proposed method outperforms other methods on benchmark RGB-D datasets and achieves state-of-the-art performance.

Published in: 2018 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 23-27 July 2018

Date Added to IEEE Xplore: 11 October 2018

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME.2018.8486539

Conference Location: San Diego, CA, USA

Contents

1. Introduction

Acquiring depth information of real scenes is an non-trival task for many applications, such as semantic labeling, pose estimation [1], 3D modeling [2], etc. While high quality texture information can be easily captured by popular color cameras, the acquisition of depth information is still remaining a challenging task in real conditions. Traditional methods of depth acquisition mainly rely on stereo matching techniques [4], or some specialized depth sensing apparatus [5]. Stereo matching uses image correspondence matching and triangulation methods to compute depth information based on two-view images captured by calibrated binocular camera systems, while others using depth sensors, e.g., Time-of-Flight camera and Microsoft Kinect, which use the active mechanism to acquire scene depth directly (Some postprocessing techniques [6] are employed to obtain high quality depth map). These methods can achieve a relatively satisfying results, but are extremely dependent on the capturing apparatus. Hence, it is essential to develop a method to estimate scene depth information by exploiting monocular cues in scenarios where direct depth sensing is not available or not possible. It is worth noting that, in absence of geometry assumptions in texture images, depth estimation from a color image of a generic scene is seriously ill-posed due to the inherent ambiguity of mapping an color measurement into a depth value (Fig. 1). Fig. 1.

Depth estimation example. (a) Color image; (b) groundtruth (gt) depth map; results obtained by (c) laina et al. [3] and (d) ours.

References is not available for this document.

High Quality Depth Estimation from Monocular Images Based on Depth Prediction and Enhancement Sub-Networks

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

High Quality Depth Estimation from Monocular Images Based on Depth Prediction and Enhancement Sub-Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?