Loading [MathJax]/extensions/MathMenu.js
Focal-WNet: An Architecture Unifying Convolution and Attention for Depth Estimation | IEEE Conference Publication | IEEE Xplore

Focal-WNet: An Architecture Unifying Convolution and Attention for Depth Estimation


Abstract:

Extracting depth information from a single RGB image is a fundamental and challenging task in computer vision with wide-ranging applications. This task cannot be solved u...Show More

Abstract:

Extracting depth information from a single RGB image is a fundamental and challenging task in computer vision with wide-ranging applications. This task cannot be solved using traditional methods like multi-view geometry but only by deep learning. Existing methods using convolutional neural nets produce inconsistent and blurry results due to the lack of long-range dependencies. With the recent success of Transformer networks in computer vision, which can process information locally and globally, we leverage this idea to propose a novel architecture named Focal-WNet in this paper. This architecture consists of two separate encoders and a single decoder. The main aim of this network is to learn most monocular depth cues like relative scale, contrast differences, texture gradient, etc. We incorporate focal self-attention instead of vanilla self-attention to reduce the computational complexity of the network. Along with the focal transformer layers, we leverage a convolutional architecture to learn depth cues that cannot be learned by a transformer alone as some cues like occlusion require a local receptive field and are easier for a conv-net to learn. Extensive experiments show that the proposed Focal-WNet achieves competitive results on two challenging datasets. Our code and pre-trained weights are available at https://github.com/Goubeast/Focal-WNet
Date of Conference: 07-09 April 2022
Date Added to IEEE Xplore: 18 July 2022
ISBN Information:
Conference Location: Mumbai, India

Contact IEEE to Subscribe

References

References is not available for this document.