Abstract:
Visual manipulation relationship detection facilitates robots to achieve safe, orderly, and efficient grasping tasks. However, most existing algorithms only model object-...Show MoreMetadata
Abstract:
Visual manipulation relationship detection facilitates robots to achieve safe, orderly, and efficient grasping tasks. However, most existing algorithms only model object-level or relational-level dependency individually, lacking sufficient global information, which is difficult to handle different types of reasoning errors, especially in complex environments with multi-object stacking and occlusion. To solve the above problems, we propose Dual Graph Attention Networks (Dual-GAT) for visual manipulation relationship detection, with an object-level graph network for capturing object-level dependencies and a relational-level graph network for capturing relational triplets-level interactions. The attention mechanism assigns different weights to different dependencies, obtains more accurate global context information for reasoning, and gets a manipulation relationship graph. In addition, we use multi-view feature fusion to improve the occluded object features, then enhance the relationship detection performance in multi-object scenes. Finally, our method is deployed on the robot to construct a multi-object grasping system, which can be well applied to stacking environments. Experimental results on the datasets VMRD and REGRAD show that our method significantly outperforms others. Note to Practitioners—The motivation of this research paper is to design efficient and accurate visual manipulation relationship reasoning methods to accomplish relevant grasping tasks in complex stacked scenes with multiple objects. The problem requires judging the positional relationship between objects in a stacked scene and determining the appropriate grasping order. In order to accurately grasp the target objects, this paper proposes a Dual Graph Attention Network for visual manipulation relationship detection, which utilizes object-level and relationship-level dependencies to obtain accurate global information for better performance. Meanwhile, multi-view feature fusion effectively improve...
Published in: IEEE Transactions on Automation Science and Engineering ( Volume: 22)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Visual Detection ,
- Dual Network ,
- Graph Attention Network ,
- Dual Attention ,
- Visual Relationship ,
- Dual Graph ,
- Dual Attention Network ,
- Robotic Grasping ,
- Detection Performance ,
- Attention Mechanism ,
- Global Information ,
- Object Features ,
- Multiple Objects ,
- Feature Fusion ,
- Network Graph ,
- Global Context Information ,
- Occluded Objects ,
- Occlusion Problem ,
- Training Set ,
- Single Image ,
- Graph Neural Networks ,
- Graph Convolutional Network ,
- Objects In The Scene ,
- Robotic Gripper ,
- Multi-view Images ,
- Main View ,
- Object Pairs ,
- Object Location ,
- Nodes In The Graph ,
- Number Of Objects
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Visual Detection ,
- Dual Network ,
- Graph Attention Network ,
- Dual Attention ,
- Visual Relationship ,
- Dual Graph ,
- Dual Attention Network ,
- Robotic Grasping ,
- Detection Performance ,
- Attention Mechanism ,
- Global Information ,
- Object Features ,
- Multiple Objects ,
- Feature Fusion ,
- Network Graph ,
- Global Context Information ,
- Occluded Objects ,
- Occlusion Problem ,
- Training Set ,
- Single Image ,
- Graph Neural Networks ,
- Graph Convolutional Network ,
- Objects In The Scene ,
- Robotic Gripper ,
- Multi-view Images ,
- Main View ,
- Object Pairs ,
- Object Location ,
- Nodes In The Graph ,
- Number Of Objects
- Author Keywords