Skip to Main Content
Our goal is that the robot learns specific objects (not object category) from images. The major problem here is how to separate the target object from the background. We create a scene model from an image sequence. The scene model contains both the target object and background. We separate the target object from the background by matching the scene model and training images having different backgrounds. A scene model consists of a 3D model and 2D models. We utilize edge points to represent detailed object shape. The 3D model is composed of the 3D points reconstructed from image edge points using structure-from-motion technique. A 2D model consists of an image in the input image sequence, edge points in the image, and the camera pose from which the image was taken. Each edge point has a SIFT descriptor for edge-point matching. The scale space analysis is done to obtain scale-invariant edge points.