I. Introduction
In this paper, we consider the task of grasping novel objects, given its image and aligned depth map. Our goal is to estimate the gripper configuration (i.e., the 3D location, 3D orientation and gripper opening width) at the final location when the robot is about to close the gripper. Recently, several learning algorithms [1]–[3] have shown promise in handling incomplete and noisy data, variations in the environment, as well as grasping novel objects. It is not clear, however, what the output of such learning algorithms should be-in this paper we discuss this issue, propose a new representation for grasping, and present a fast and efficient learning algorithm to learn this representation.