A scalable multi-modal learning fruit detection algorithm for dynamic environments.

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics Pub Date : 2025-02-07 eCollection Date: 2024-01-01 DOI:10.3389/fnbot.2024.1518878

Liang Mao, Zihao Guo, Mingzhe Liu, Yue Li, Linlin Wang, Jie Li

{"title":"A scalable multi-modal learning fruit detection algorithm for dynamic environments.","authors":"Liang Mao, Zihao Guo, Mingzhe Liu, Yue Li, Linlin Wang, Jie Li","doi":"10.3389/fnbot.2024.1518878","DOIUrl":null,"url":null,"abstract":"Introduction: To enhance the detection of litchi fruits in natural scenes, address challenges such as dense occlusion and small target identification, this paper proposes a novel multimodal target detection method, denoted as YOLOv5-Litchi.Methods: Initially, the Neck layer network of YOLOv5s is simplified by changing its FPN+PAN structure to an FPN structure and increasing the number of detection heads from 3 to 5. Additionally, the detection heads with resolutions of 80 × 80 pixels and 160 × 160 pixels are replaced by TSCD detection heads to enhance the model's ability to detect small targets. Subsequently, the positioning loss function is replaced with the EIoU loss function, and the confidence loss is substituted by VFLoss to further improve the accuracy of the detection bounding box and reduce the missed detection rate in occluded targets. A sliding slice method is then employed to predict image targets, thereby reducing the miss rate of small targets.Results: Experimental results demonstrate that the proposed model improves accuracy, recall, and mean average precision (mAP) by 9.5, 0.9, and 12.3 percentage points, respectively, compared to the original YOLOv5s model. When benchmarked against other models such as YOLOx, YOLOv6, and YOLOv8, the proposed model's AP value increases by 4.0, 6.3, and 3.7 percentage points, respectively.Discussion: The improved network exhibits distinct improvements, primarily focusing on enhancing the recall rate and AP value, thereby reducing the missed detection rate which exhibiting a reduced number of missed targets and a more accurate prediction frame, indicating its suitability for litchi fruit detection. Therefore, this method significantly enhances the detection accuracy of mature litchi fruits and effectively addresses the challenges of dense occlusion and small target detection, providing crucial technical support for subsequent litchi yield estimation.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1518878"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11841473/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neurorobotics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3389/fnbot.2024.1518878","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: To enhance the detection of litchi fruits in natural scenes, address challenges such as dense occlusion and small target identification, this paper proposes a novel multimodal target detection method, denoted as YOLOv5-Litchi.

Methods: Initially, the Neck layer network of YOLOv5s is simplified by changing its FPN+PAN structure to an FPN structure and increasing the number of detection heads from 3 to 5. Additionally, the detection heads with resolutions of 80 × 80 pixels and 160 × 160 pixels are replaced by TSCD detection heads to enhance the model's ability to detect small targets. Subsequently, the positioning loss function is replaced with the EIoU loss function, and the confidence loss is substituted by VFLoss to further improve the accuracy of the detection bounding box and reduce the missed detection rate in occluded targets. A sliding slice method is then employed to predict image targets, thereby reducing the miss rate of small targets.

Results: Experimental results demonstrate that the proposed model improves accuracy, recall, and mean average precision (mAP) by 9.5, 0.9, and 12.3 percentage points, respectively, compared to the original YOLOv5s model. When benchmarked against other models such as YOLOx, YOLOv6, and YOLOv8, the proposed model's AP value increases by 4.0, 6.3, and 3.7 percentage points, respectively.

Discussion: The improved network exhibits distinct improvements, primarily focusing on enhancing the recall rate and AP value, thereby reducing the missed detection rate which exhibiting a reduced number of missed targets and a more accurate prediction frame, indicating its suitability for litchi fruit detection. Therefore, this method significantly enhances the detection accuracy of mature litchi fruits and effectively addresses the challenges of dense occlusion and small target detection, providing crucial technical support for subsequent litchi yield estimation.

查看原文本刊更多论文

动态环境下可扩展的多模态学习果实检测算法。

为了增强荔枝果实在自然场景中的检测能力，解决密集遮挡和小目标识别等问题，本文提出了一种新的多模态目标检测方法，命名为YOLOv5-Litchi。方法：首先对YOLOv5s的颈部层网络进行简化，将其FPN+PAN结构改为FPN结构，并将检测头数从3个增加到5个。此外，将分辨率为80 × 80像素和160 × 160像素的检测头替换为TSCD检测头，增强了模型对小目标的检测能力。随后，将定位损失函数替换为EIoU损失函数，将置信度损失替换为VFLoss，进一步提高检测边界盒的精度，降低遮挡目标的漏检率。然后采用滑动切片法对图像目标进行预测，从而降低小目标的脱靶率。结果：实验结果表明，与原始的YOLOv5s模型相比，该模型的准确率、召回率和平均精度（mAP）分别提高了9.5、0.9和12.3个百分点。当与其他模型（如YOLOx、YOLOv6和YOLOv8）进行基准测试时，建议模型的AP值分别增加4.0、6.3和3.7个百分点。讨论：改进后的网络表现出明显的改进，主要集中在提高了召回率和AP值，从而降低了漏检率，漏检目标数量减少，预测框架更加准确，适合荔枝果检测。因此，该方法显著提高了荔枝成熟果实的检测精度，有效解决了密集遮挡和小目标检测的难题，为后续荔枝产量估算提供了关键的技术支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Neurorobotics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCER-ROBOTICS

CiteScore

5.20

自引率

6.50%

发文量

250

审稿时长

14 weeks

期刊介绍： Frontiers in Neurorobotics publishes rigorously peer-reviewed research in the science and technology of embodied autonomous neural systems. Specialty Chief Editors Alois C. Knoll and Florian Röhrbein at the Technische Universität München are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics and the public worldwide. Neural systems include brain-inspired algorithms (e.g. connectionist networks), computational models of biological neural networks (e.g. artificial spiking neural nets, large-scale simulations of neural microcircuits) and actual biological systems (e.g. in vivo and in vitro neural nets). The focus of the journal is the embodiment of such neural systems in artificial software and hardware devices, machines, robots or any other form of physical actuation. This also includes prosthetic devices, brain machine interfaces, wearable systems, micro-machines, furniture, home appliances, as well as systems for managing micro and macro infrastructures. Frontiers in Neurorobotics also aims to publish radically new tools and methods to study plasticity and development of autonomous self-learning systems that are capable of acquiring knowledge in an open-ended manner. Models complemented with experimental studies revealing self-organizing principles of embodied neural systems are welcome. Our journal also publishes on the micro and macro engineering and mechatronics of robotic devices driven by neural systems, as well as studies on the impact that such systems will have on our daily life.