Loading [MathJax]/extensions/MathZoom.js
RoG-SAM: A Language-Driven Framework for Instance-Level Robotic Grasping Detection | IEEE Journals & Magazine | IEEE Xplore

RoG-SAM: A Language-Driven Framework for Instance-Level Robotic Grasping Detection


Abstract:

Robotic grasping is a crucial topic in robotics and computer vision, with broad applications in industrial production and intelligent manufacturing. Although some methods...Show More

Abstract:

Robotic grasping is a crucial topic in robotics and computer vision, with broad applications in industrial production and intelligent manufacturing. Although some methods have begun addressing instance-level grasping, most remain limited to predefined instances and categories, lacking flexibility for open-vocabulary grasp prediction based on user-specified instructions. To address this, we propose RoG-SAM, a language-driven, instance-level grasp detection framework built on Segment Anything Model (SAM). RoG-SAM utilizes open-vocabulary prompts for object localization and grasp pose prediction, adapting SAM through transfer learning with encoder adapters and multi-head decoders to extend its segmentation capabilities to grasp pose estimation. Experimental results show that RoG-SAM achieves competitive performance on single-object datasets (Cornell and Jacquard) and cluttered datasets (GraspNet-1Billion and OCID), with instance-level accuracies of 91.2% and 90.1%, respectively, while using only 28.3% of SAM's trainable parameters. The effectiveness of RoG-SAM was also validated in real-world environments. A demonstration video is available at https://www.youtube.com/playlist?list=PL7et4nGJAImLGytsJbglGbXl1hacA2dy_.
Published in: IEEE Transactions on Multimedia ( Early Access )
Page(s): 1 - 13
Date of Publication: 03 April 2025

ISSN Information:


Contact IEEE to Subscribe