Skip to Main Content
Autonomous systems which learn and utilize a limited visual vocabulary have wide spread applications. Enabling such systems to segment a set of cluttered scenes into objects is a challenging vision problem owing to the non-homogeneous texture of objects and the random configurations of multiple objects in each scene. We present a solution to the following question: given a collection of images where each object appears in one or more images and multiple objects occur in each image, how best can we extract the boundaries of the different objects? The algorithm is presented with a set of stereo images, with one stereo pair per scene. The novelty of our work is the use of both color/texture and structure to refine previously determined object boundaries to achieve segmentation consistent with each of the input scenes presented. The algorithm populates an object library, which consists of a 3D model per object. Since an object is characterized both by texture and structure, for most purposes this representation is both complete and concise.