I. Introduction
By perceiving and understanding complex environments, robots can detect and infer the relationships between objects. Pairs of objects and their relationships are usually represented in the form of triples, such as . In [1], the visual manipulation relationship is proposed for robot grasping tasks, which is divided into three categories: parent, child, and no relationship. Most of the current robotic grippers follow the object detection frame for grasping, which makes it difficult to keep the object attitude center of gravity, etc., unchanged during the grasping process. As shown in Fig. 1(a), a water cup is on top of a book. If this positional relationship is ignored and the book is grasped directly, it will cause the water cup to fall or be damaged and the robot cannot realize safe grasping. Therefore this visual manipulation relationship is needed in stacked scenarios of multiple objects. As shown in Fig. 1(b), in a stacked scene with five objects, the robot needs to determine the relationship between pairs of objects, generate the correct operation relationship graph, and operate according to the grasping order. And in this graph, the robot grasping the notebook should first remove all other objects on it. With a generated manipulation relationship graph, the robot can grasp objects in order.
(a) The importance of grasping order. (b) The generation process of visual manipulation relationship graph. The graph shows the correct manipulation relationship graph and robot grasping order. It is worth noting that the order of the manipulation relationship graph is the reverse of the grasping order.