基于双流You Only Look Once框架的箱形物体机器人抓取方法

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-07-09 DOI:10.1016/j.engappai.2025.111559

Haoran Deng, Jinlong Shi, Jiale Cui, Suqin Bai, Xin Zuo

{"title":"基于双流You Only Look Once框架的箱形物体机器人抓取方法","authors":"Haoran Deng, Jinlong Shi, Jiale Cui, Suqin Bai, Xin Zuo","doi":"10.1016/j.engappai.2025.111559","DOIUrl":null,"url":null,"abstract":"<div><div>In the application area of sorting and placement, efficient sorting and precise placement of box-shaped objects have always been a key task, traditionally relying heavily on manual operation. With the advancement of industrial automation and artificial intelligence (AI), the application of AI in automation solutions offers a more effective approach to replacing manual labor, thereby enhancing production efficiency. At present, there is still a lack of box-shaped object classification and grasping methods for cluttered stacking scenes. To address this issue, this paper presents an AI-based approach that improves upon the You Only Look Once (YOLO) version 8 framework, enabling it to perform Red, Green, Blue, and Depth (RGBD) instance segmentation, called as Dual-Stream YOLO (DS-YOLO), and then combined with feature point matching algorithm, we can achieve precise recognition, grasping, and orderly placement of box-shaped objects in complex stacking environments. We created a synthetic dataset, referred to as Snack Box. Compared to state-of-the-art methods, the model trained with DS-YOLO on the Snack Box dataset achieves a 0.5% improvement in both mean average precision(mAP) at 50% intersection over union (IoU) and mAP from 50% to 90% IoU metrics, with a 93.3% grasp success rate. The average center point error is 6.20 millimeter, with plane normal vectors deviating by 4.27°and object angles by 6.26°on average. Additionally, our method outperforms state-of-the-art approaches on the Low-light Vision Visible-infrared Paired (LLVIP) dataset, with 15,488 aligned image pairs and pedestrian annotations. Our code and dataset are available at: <span><span>https://github.com/DHR0703/YOLOv8_dual_Stream</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111559"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A robotic grasping method of box-shaped objects based on Dual-Stream You Only Look Once framework\",\"authors\":\"Haoran Deng, Jinlong Shi, Jiale Cui, Suqin Bai, Xin Zuo\",\"doi\":\"10.1016/j.engappai.2025.111559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the application area of sorting and placement, efficient sorting and precise placement of box-shaped objects have always been a key task, traditionally relying heavily on manual operation. With the advancement of industrial automation and artificial intelligence (AI), the application of AI in automation solutions offers a more effective approach to replacing manual labor, thereby enhancing production efficiency. At present, there is still a lack of box-shaped object classification and grasping methods for cluttered stacking scenes. To address this issue, this paper presents an AI-based approach that improves upon the You Only Look Once (YOLO) version 8 framework, enabling it to perform Red, Green, Blue, and Depth (RGBD) instance segmentation, called as Dual-Stream YOLO (DS-YOLO), and then combined with feature point matching algorithm, we can achieve precise recognition, grasping, and orderly placement of box-shaped objects in complex stacking environments. We created a synthetic dataset, referred to as Snack Box. Compared to state-of-the-art methods, the model trained with DS-YOLO on the Snack Box dataset achieves a 0.5% improvement in both mean average precision(mAP) at 50% intersection over union (IoU) and mAP from 50% to 90% IoU metrics, with a 93.3% grasp success rate. The average center point error is 6.20 millimeter, with plane normal vectors deviating by 4.27°and object angles by 6.26°on average. Additionally, our method outperforms state-of-the-art approaches on the Low-light Vision Visible-infrared Paired (LLVIP) dataset, with 15,488 aligned image pairs and pedestrian annotations. Our code and dataset are available at: <span><span>https://github.com/DHR0703/YOLOv8_dual_Stream</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"159 \",\"pages\":\"Article 111559\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625015611\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625015611","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在分拣放置的应用领域中，盒形物体的高效分拣和精确放置一直是一项关键任务，传统上严重依赖人工操作。随着工业自动化和人工智能（AI）的发展，人工智能在自动化解决方案中的应用为替代人工劳动提供了更有效的方法，从而提高了生产效率。目前，对于杂乱叠加场景，仍然缺乏盒状物体的分类和抓取方法。针对这一问题，本文提出了一种基于人工智能的方法，该方法在You Only Look Once (YOLO) version 8框架的基础上进行改进，使其能够执行Red， Green, Blue, and Depth （RGBD）实例分割，称为双流YOLO (DS-YOLO)，然后结合特征点匹配算法，我们可以在复杂的堆叠环境中实现盒形物体的精确识别、抓取和有序放置。我们创建了一个合成数据集，称为“零食盒”。与最先进的方法相比，在Snack Box数据集上使用DS-YOLO训练的模型在50%相交比联合（IoU）和mAP从50%到90% IoU指标上的平均精度（mAP）都提高了0.5%，抓取成功率为93.3%。平均中心点误差为6.20 mm，平面法向量平均偏差4.27°，物体角度平均偏差6.26°。此外，我们的方法在低光视觉可见红外配对（LLVIP）数据集上优于最先进的方法，具有15,488对对齐图像和行人注释。我们的代码和数据集可在：https://github.com/DHR0703/YOLOv8_dual_Stream。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A robotic grasping method of box-shaped objects based on Dual-Stream You Only Look Once framework

In the application area of sorting and placement, efficient sorting and precise placement of box-shaped objects have always been a key task, traditionally relying heavily on manual operation. With the advancement of industrial automation and artificial intelligence (AI), the application of AI in automation solutions offers a more effective approach to replacing manual labor, thereby enhancing production efficiency. At present, there is still a lack of box-shaped object classification and grasping methods for cluttered stacking scenes. To address this issue, this paper presents an AI-based approach that improves upon the You Only Look Once (YOLO) version 8 framework, enabling it to perform Red, Green, Blue, and Depth (RGBD) instance segmentation, called as Dual-Stream YOLO (DS-YOLO), and then combined with feature point matching algorithm, we can achieve precise recognition, grasping, and orderly placement of box-shaped objects in complex stacking environments. We created a synthetic dataset, referred to as Snack Box. Compared to state-of-the-art methods, the model trained with DS-YOLO on the Snack Box dataset achieves a 0.5% improvement in both mean average precision(mAP) at 50% intersection over union (IoU) and mAP from 50% to 90% IoU metrics, with a 93.3% grasp success rate. The average center point error is 6.20 millimeter, with plane normal vectors deviating by 4.27°and object angles by 6.26°on average. Additionally, our method outperforms state-of-the-art approaches on the Low-light Vision Visible-infrared Paired (LLVIP) dataset, with 15,488 aligned image pairs and pedestrian annotations. Our code and dataset are available at: https://github.com/DHR0703/YOLOv8_dual_Stream.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.