{"title":"CR-Net:集成卷积块注意模块和残差模块的机器人抓取检测方法","authors":"Song Yan, Lei Zhang","doi":"10.1049/cvi2.12252","DOIUrl":null,"url":null,"abstract":"<p>Grasping detection, which involves identifying and assessing the grasp ability of objects by robotic systems, has garnered significant attention in recent years due to its pivotal role in the development of robotic systems and automated assembly processes. Despite notable advancements in this field, current methods often grapple with both practical and theoretical challenges that hinder their real-world applicability. These challenges encompass low detection accuracy, the burden of oversized model parameters, and the inherent complexity of real-world scenarios. In response to these multifaceted challenges, a novel lightweight grasping detection model that not only addresses the technical aspects but also delves into the underlying theoretical complexities is introduced. The proposed model incorporates attention mechanisms and residual modules to tackle the theoretical challenges posed by varying object shapes, sizes, materials, and environmental conditions. To enhance its performance in the face of these theoretical complexities, the proposed model employs a Convolutional Block Attention Module (CBAM) to extract features from RGB and depth channels, recognising the multifaceted nature of object properties. Subsequently, a feature fusion module effectively combines these diverse features, providing a solution to the theoretical challenge of information integration. The model then processes the fused features through five residual blocks, followed by another CBAM attention module, culminating in the generation of three distinct images representing capture quality, grasping angle, and grasping width. These images collectively yield the final grasp detection results, addressing the theoretical complexities inherent in this task. The proposed model's rigorous training and evaluation on the Cornell Grasp dataset demonstrate remarkable detection accuracy rates of 98.44% on the Image-wise split and 96.88% on the Object-wise split. The experimental results strongly corroborate the exceptional performance of the proposed model, underscoring its ability to overcome the theoretical challenges associated with grasping detection. The integration of the residual module ensures rapid training, while the attention module facilitates precise feature extraction, ultimately striking an effective balance between detection time and accuracy.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 3","pages":"420-433"},"PeriodicalIF":1.5000,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12252","citationCount":"0","resultStr":"{\"title\":\"CR-Net: Robot grasping detection method integrating convolutional block attention module and residual module\",\"authors\":\"Song Yan, Lei Zhang\",\"doi\":\"10.1049/cvi2.12252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Grasping detection, which involves identifying and assessing the grasp ability of objects by robotic systems, has garnered significant attention in recent years due to its pivotal role in the development of robotic systems and automated assembly processes. Despite notable advancements in this field, current methods often grapple with both practical and theoretical challenges that hinder their real-world applicability. These challenges encompass low detection accuracy, the burden of oversized model parameters, and the inherent complexity of real-world scenarios. In response to these multifaceted challenges, a novel lightweight grasping detection model that not only addresses the technical aspects but also delves into the underlying theoretical complexities is introduced. The proposed model incorporates attention mechanisms and residual modules to tackle the theoretical challenges posed by varying object shapes, sizes, materials, and environmental conditions. To enhance its performance in the face of these theoretical complexities, the proposed model employs a Convolutional Block Attention Module (CBAM) to extract features from RGB and depth channels, recognising the multifaceted nature of object properties. Subsequently, a feature fusion module effectively combines these diverse features, providing a solution to the theoretical challenge of information integration. The model then processes the fused features through five residual blocks, followed by another CBAM attention module, culminating in the generation of three distinct images representing capture quality, grasping angle, and grasping width. These images collectively yield the final grasp detection results, addressing the theoretical complexities inherent in this task. The proposed model's rigorous training and evaluation on the Cornell Grasp dataset demonstrate remarkable detection accuracy rates of 98.44% on the Image-wise split and 96.88% on the Object-wise split. The experimental results strongly corroborate the exceptional performance of the proposed model, underscoring its ability to overcome the theoretical challenges associated with grasping detection. The integration of the residual module ensures rapid training, while the attention module facilitates precise feature extraction, ultimately striking an effective balance between detection time and accuracy.</p>\",\"PeriodicalId\":56304,\"journal\":{\"name\":\"IET Computer Vision\",\"volume\":\"18 3\",\"pages\":\"420-433\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12252\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12252\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12252","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Grasping detection, which involves identifying and assessing the grasp ability of objects by robotic systems, has garnered significant attention in recent years due to its pivotal role in the development of robotic systems and automated assembly processes. Despite notable advancements in this field, current methods often grapple with both practical and theoretical challenges that hinder their real-world applicability. These challenges encompass low detection accuracy, the burden of oversized model parameters, and the inherent complexity of real-world scenarios. In response to these multifaceted challenges, a novel lightweight grasping detection model that not only addresses the technical aspects but also delves into the underlying theoretical complexities is introduced. The proposed model incorporates attention mechanisms and residual modules to tackle the theoretical challenges posed by varying object shapes, sizes, materials, and environmental conditions. To enhance its performance in the face of these theoretical complexities, the proposed model employs a Convolutional Block Attention Module (CBAM) to extract features from RGB and depth channels, recognising the multifaceted nature of object properties. Subsequently, a feature fusion module effectively combines these diverse features, providing a solution to the theoretical challenge of information integration. The model then processes the fused features through five residual blocks, followed by another CBAM attention module, culminating in the generation of three distinct images representing capture quality, grasping angle, and grasping width. These images collectively yield the final grasp detection results, addressing the theoretical complexities inherent in this task. The proposed model's rigorous training and evaluation on the Cornell Grasp dataset demonstrate remarkable detection accuracy rates of 98.44% on the Image-wise split and 96.88% on the Object-wise split. The experimental results strongly corroborate the exceptional performance of the proposed model, underscoring its ability to overcome the theoretical challenges associated with grasping detection. The integration of the residual module ensures rapid training, while the attention module facilitates precise feature extraction, ultimately striking an effective balance between detection time and accuracy.
期刊介绍:
IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision.
IET Computer Vision welcomes submissions on the following topics:
Biologically and perceptually motivated approaches to low level vision (feature detection, etc.);
Perceptual grouping and organisation
Representation, analysis and matching of 2D and 3D shape
Shape-from-X
Object recognition
Image understanding
Learning with visual inputs
Motion analysis and object tracking
Multiview scene analysis
Cognitive approaches in low, mid and high level vision
Control in visual systems
Colour, reflectance and light
Statistical and probabilistic models
Face and gesture
Surveillance
Biometrics and security
Robotics
Vehicle guidance
Automatic model aquisition
Medical image analysis and understanding
Aerial scene analysis and remote sensing
Deep learning models in computer vision
Both methodological and applications orientated papers are welcome.
Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review.
Special Issues Current Call for Papers:
Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf
Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf