PLOT: Human-Like Push-Grasping Synergy Learning in Clutter With One-Shot Target Recognition

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-01-22 DOI:10.1109/TCDS.2024.3357084

Xiaoge Cao;Tao Lu;Liming Zheng;Yinghao Cai;Shuo Wang

{"title":"PLOT: Human-Like Push-Grasping Synergy Learning in Clutter With One-Shot Target Recognition","authors":"Xiaoge Cao;Tao Lu;Liming Zheng;Yinghao Cai;Shuo Wang","doi":"10.1109/TCDS.2024.3357084","DOIUrl":null,"url":null,"abstract":"In unstructured environments, robotic grasping tasks are frequently required to interactively search for and retrieve specific objects from a cluttered workspace under the condition that only partial information about the target is available, like images, text descriptions, 3-D models, etc. It is a great challenge to correctly recognize the targets with limited information and learn synergies between different action primitives to grasp the targets from densely occluding objects efficiently. In this article, we propose a novel human-like push-grasping method that could grasp unknown objects in clutter using only one target RGB with Depth (RGB-D) image, called push-grasping synergy learning in clutter with one-shot target recognition (PLOT). First, we propose a target recognition (TR) method which automatically segments the objects both from the query image and workspace image, and extract the robust features of each segmented object. Through the designed feature matching criterion, the targets could be quickly located in the workspace. Second, we introduce a self-supervised target-oriented grasping system based on synergies between push and grasp actions. In this system, we propose a salient Q (SQ)-learning framework that focuses the \n<italic>Q</i>\n value learning in the area including targets and a coordination mechanism (CM) that selects the proper actions to search and isolate the targets from the surrounding objects, even in the condition of targets invisible. Our method is inspired by the working memory mechanism of human brain and can grasp any target object shown through the image and has good generality in application. Experimental results in simulation and real-world show that our method achieved the best performance compared with the baselines in finding the unknown target objects from the cluttered environment with only one demonstrated target RGB-D image and had the high efficiency of grasping under the synergies of push and grasp actions.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1391-1404"},"PeriodicalIF":5.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10411941/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In unstructured environments, robotic grasping tasks are frequently required to interactively search for and retrieve specific objects from a cluttered workspace under the condition that only partial information about the target is available, like images, text descriptions, 3-D models, etc. It is a great challenge to correctly recognize the targets with limited information and learn synergies between different action primitives to grasp the targets from densely occluding objects efficiently. In this article, we propose a novel human-like push-grasping method that could grasp unknown objects in clutter using only one target RGB with Depth (RGB-D) image, called push-grasping synergy learning in clutter with one-shot target recognition (PLOT). First, we propose a target recognition (TR) method which automatically segments the objects both from the query image and workspace image, and extract the robust features of each segmented object. Through the designed feature matching criterion, the targets could be quickly located in the workspace. Second, we introduce a self-supervised target-oriented grasping system based on synergies between push and grasp actions. In this system, we propose a salient Q (SQ)-learning framework that focuses the Q value learning in the area including targets and a coordination mechanism (CM) that selects the proper actions to search and isolate the targets from the surrounding objects, even in the condition of targets invisible. Our method is inspired by the working memory mechanism of human brain and can grasp any target object shown through the image and has good generality in application. Experimental results in simulation and real-world show that our method achieved the best performance compared with the baselines in finding the unknown target objects from the cluttered environment with only one demonstrated target RGB-D image and had the high efficiency of grasping under the synergies of push and grasp actions.

查看原文本刊更多论文

PLOT：杂波中的类人推抓协同学习与单次目标识别

在非结构化环境中，机器人抓取任务经常需要在只有目标的部分信息（如图像、文字描述、三维模型等）的条件下，从杂乱的工作空间中交互式地搜索和检索特定物体。如何在信息有限的情况下正确识别目标，并学习不同动作原语之间的协同作用，从而高效地从密集遮挡的物体中抓取目标，是一项巨大的挑战。在本文中，我们提出了一种新颖的类人推抓方法，该方法只需使用一张目标 RGB 与深度（RGB-D）图像即可在杂波中抓取未知物体，称为杂波中的推抓协同学习与单次目标识别（PLOT）。首先，我们提出了一种目标识别（TR）方法，它能自动从查询图像和工作区图像中分割出目标，并提取每个分割出的目标的鲁棒特征。通过所设计的特征匹配标准，可以快速定位工作区中的目标。其次，我们引入了基于推和抓动作协同作用的自监督目标导向抓取系统。在该系统中，我们提出了一个突出 Q 值（SQ）学习框架，将 Q 值学习集中在包括目标在内的区域；同时还提出了一个协调机制（CM），即使在目标不可见的情况下，也能选择适当的动作来搜索目标并将其从周围物体中分离出来。我们的方法受人脑工作记忆机制的启发，能抓住图像中显示的任何目标对象，具有良好的应用通用性。仿真和真实世界的实验结果表明，与基线方法相比，我们的方法在仅有一幅展示的目标 RGB-D 图像的情况下，就能从杂乱的环境中找到未知目标物体，并且在推和抓动作的协同作用下具有较高的抓取效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cognitive and Developmental Systems Computer Science-Software

CiteScore

7.20

自引率

10.00%

发文量

170

期刊介绍： The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.