Skip to Main Content
This paper presents a method for learning overcomplete dictionaries of atoms composed of two modalities that describe a 3D scene: 1) image intensity and 2) scene depth. We propose a novel joint basis pursuit (JBP) algorithm that finds related sparse features in two modalities using conic programming and we integrate it into a two-step dictionary learning algorithm. The JBP differs from related convex algorithms because it finds joint sparsity models with different atoms and different coefficient values for intensity and depth. This is crucial for recovering generative models where the same sparse underlying causes (3D features) give rise to different signals (intensity and depth). We give a bound for recovery error of sparse coefficients obtained by JBP, and show numerically that JBP is superior to the group lasso algorithm. When applied to the Middlebury depth-intensity database, our learning algorithm converges to a set of related features, such as pairs of depth and intensity edges or image textures and depth slants. Finally, we show that JBP outperforms state of the art methods on depth inpainting for time-of-flight and Microsoft Kinect 3D data.