CR-Net：集成卷积块注意模块和残差模块的机器人抓取检测方法

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2023-11-11 DOI:10.1049/cvi2.12252

Song Yan, Lei Zhang

{"title":"CR-Net：集成卷积块注意模块和残差模块的机器人抓取检测方法","authors":"Song Yan, Lei Zhang","doi":"10.1049/cvi2.12252","DOIUrl":null,"url":null,"abstract":"<p>Grasping detection, which involves identifying and assessing the grasp ability of objects by robotic systems, has garnered significant attention in recent years due to its pivotal role in the development of robotic systems and automated assembly processes. Despite notable advancements in this field, current methods often grapple with both practical and theoretical challenges that hinder their real-world applicability. These challenges encompass low detection accuracy, the burden of oversized model parameters, and the inherent complexity of real-world scenarios. In response to these multifaceted challenges, a novel lightweight grasping detection model that not only addresses the technical aspects but also delves into the underlying theoretical complexities is introduced. The proposed model incorporates attention mechanisms and residual modules to tackle the theoretical challenges posed by varying object shapes, sizes, materials, and environmental conditions. To enhance its performance in the face of these theoretical complexities, the proposed model employs a Convolutional Block Attention Module (CBAM) to extract features from RGB and depth channels, recognising the multifaceted nature of object properties. Subsequently, a feature fusion module effectively combines these diverse features, providing a solution to the theoretical challenge of information integration. The model then processes the fused features through five residual blocks, followed by another CBAM attention module, culminating in the generation of three distinct images representing capture quality, grasping angle, and grasping width. These images collectively yield the final grasp detection results, addressing the theoretical complexities inherent in this task. The proposed model's rigorous training and evaluation on the Cornell Grasp dataset demonstrate remarkable detection accuracy rates of 98.44% on the Image-wise split and 96.88% on the Object-wise split. The experimental results strongly corroborate the exceptional performance of the proposed model, underscoring its ability to overcome the theoretical challenges associated with grasping detection. The integration of the residual module ensures rapid training, while the attention module facilitates precise feature extraction, ultimately striking an effective balance between detection time and accuracy.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 3","pages":"420-433"},"PeriodicalIF":1.5000,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12252","citationCount":"0","resultStr":"{\"title\":\"CR-Net: Robot grasping detection method integrating convolutional block attention module and residual module\",\"authors\":\"Song Yan, Lei Zhang\",\"doi\":\"10.1049/cvi2.12252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Grasping detection, which involves identifying and assessing the grasp ability of objects by robotic systems, has garnered significant attention in recent years due to its pivotal role in the development of robotic systems and automated assembly processes. Despite notable advancements in this field, current methods often grapple with both practical and theoretical challenges that hinder their real-world applicability. These challenges encompass low detection accuracy, the burden of oversized model parameters, and the inherent complexity of real-world scenarios. In response to these multifaceted challenges, a novel lightweight grasping detection model that not only addresses the technical aspects but also delves into the underlying theoretical complexities is introduced. The proposed model incorporates attention mechanisms and residual modules to tackle the theoretical challenges posed by varying object shapes, sizes, materials, and environmental conditions. To enhance its performance in the face of these theoretical complexities, the proposed model employs a Convolutional Block Attention Module (CBAM) to extract features from RGB and depth channels, recognising the multifaceted nature of object properties. Subsequently, a feature fusion module effectively combines these diverse features, providing a solution to the theoretical challenge of information integration. The model then processes the fused features through five residual blocks, followed by another CBAM attention module, culminating in the generation of three distinct images representing capture quality, grasping angle, and grasping width. These images collectively yield the final grasp detection results, addressing the theoretical complexities inherent in this task. The proposed model's rigorous training and evaluation on the Cornell Grasp dataset demonstrate remarkable detection accuracy rates of 98.44% on the Image-wise split and 96.88% on the Object-wise split. The experimental results strongly corroborate the exceptional performance of the proposed model, underscoring its ability to overcome the theoretical challenges associated with grasping detection. The integration of the residual module ensures rapid training, while the attention module facilitates precise feature extraction, ultimately striking an effective balance between detection time and accuracy.</p>\",\"PeriodicalId\":56304,\"journal\":{\"name\":\"IET Computer Vision\",\"volume\":\"18 3\",\"pages\":\"420-433\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12252\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12252\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12252","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

抓取检测涉及识别和评估机器人系统抓取物体的能力，由于其在机器人系统和自动装配流程开发中的关键作用，近年来已引起了广泛关注。尽管在这一领域取得了显著进步，但目前的方法往往面临着实践和理论两方面的挑战，阻碍了它们在现实世界中的应用。这些挑战包括检测精度低、模型参数过大以及现实世界场景固有的复杂性。为了应对这些多方面的挑战，我们提出了一种新颖的轻量级抓取检测模型，它不仅解决了技术方面的问题，还深入研究了其背后的理论复杂性。所提出的模型结合了注意力机制和残差模块，以应对不同物体形状、大小、材料和环境条件带来的理论挑战。面对这些理论上的复杂性，为了提高其性能，所提出的模型采用了卷积块注意力模块（CBAM），从 RGB 和深度通道中提取特征，以识别物体属性的多面性。随后，特征融合模块将这些不同的特征有效地结合在一起，为信息整合这一理论难题提供了解决方案。然后，该模型通过五个残差块处理融合后的特征，再通过另一个 CBAM 注意模块进行处理，最终生成代表捕捉质量、抓取角度和抓取宽度的三幅不同图像。这些图像共同产生了最终的抓取检测结果，解决了这一任务固有的理论复杂性问题。在康奈尔抓取数据集上对所提出的模型进行了严格的训练和评估，结果表明，该模型的图像检测准确率高达 98.44%，物体检测准确率高达 96.88%。实验结果有力地证明了所提出模型的卓越性能，突出了其克服与抓取检测相关的理论挑战的能力。残差模块的集成确保了快速训练，而注意力模块则有助于精确提取特征，最终在检测时间和准确性之间取得了有效的平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

CR-Net: Robot grasping detection method integrating convolutional block attention module and residual module

查看原文本刊更多论文

CR-Net: Robot grasping detection method integrating convolutional block attention module and residual module

Grasping detection, which involves identifying and assessing the grasp ability of objects by robotic systems, has garnered significant attention in recent years due to its pivotal role in the development of robotic systems and automated assembly processes. Despite notable advancements in this field, current methods often grapple with both practical and theoretical challenges that hinder their real-world applicability. These challenges encompass low detection accuracy, the burden of oversized model parameters, and the inherent complexity of real-world scenarios. In response to these multifaceted challenges, a novel lightweight grasping detection model that not only addresses the technical aspects but also delves into the underlying theoretical complexities is introduced. The proposed model incorporates attention mechanisms and residual modules to tackle the theoretical challenges posed by varying object shapes, sizes, materials, and environmental conditions. To enhance its performance in the face of these theoretical complexities, the proposed model employs a Convolutional Block Attention Module (CBAM) to extract features from RGB and depth channels, recognising the multifaceted nature of object properties. Subsequently, a feature fusion module effectively combines these diverse features, providing a solution to the theoretical challenge of information integration. The model then processes the fused features through five residual blocks, followed by another CBAM attention module, culminating in the generation of three distinct images representing capture quality, grasping angle, and grasping width. These images collectively yield the final grasp detection results, addressing the theoretical complexities inherent in this task. The proposed model's rigorous training and evaluation on the Cornell Grasp dataset demonstrate remarkable detection accuracy rates of 98.44% on the Image-wise split and 96.88% on the Object-wise split. The experimental results strongly corroborate the exceptional performance of the proposed model, underscoring its ability to overcome the theoretical challenges associated with grasping detection. The integration of the residual module ensures rapid training, while the attention module facilitates precise feature extraction, ultimately striking an effective balance between detection time and accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf