Loading [MathJax]/extensions/MathMenu.js
A Point Set Generation Network for 3D Object Reconstruction from a Single Image | IEEE Conference Publication | IEEE Xplore

A Point Set Generation Network for 3D Object Reconstruction from a Single Image


Abstract:

Generation of 3D data by deep neural networks has been attracting increasing attention in the research community. The majority of extant works resort to regular represent...Show More

Abstract:

Generation of 3D data by deep neural networks has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collections of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations, and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output - point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows strong performance for 3D shape completion and promising ability in making multiple plausible predictions.
Date of Conference: 21-26 July 2017
Date Added to IEEE Xplore: 09 November 2017
ISBN Information:
Print ISSN: 1063-6919
Conference Location: Honolulu, HI, USA

1. Introduction

As we try to duplicate the successes of current deep convolutional architectures in the 3D domain, we face a fundamental representational issue. Extant deep net architectures for both discriminative and generative learning in the signal domain are well-suited to data that is regularly sampled, such as images, audio, or video. However, most common 3D geometry representations, such as 2D meshes or point clouds are not regular structures and do not easily fit into architectures that exploit such regularity for weight sharing, etc. That is why the majority of extant works on using deep nets for 3D data resort to either volumetric grids or collections of images (2D views of the geometry). Such representations, however, lead to difficult trade-offs between sampling resolution and net efficiency. Furthermore, they enshrine quantization artifacts that obscure natural invariances of the data under rigid motions, etc.

A 3D point cloud of the complete object can be reconstructed from a single image. Each point is visualized as a small sphere. The reconstruction is viewed at two viewpoints (0° and 90° along azimuth). A segmentation mask is used to indicate the scope of the object in the image.

Contact IEEE to Subscribe

References

References is not available for this document.