Multi-scale cross-modal feature fusion and cost-sensitive loss function for differential detection of occluded bagging pears in practical orchards

IF 12.4 Q1 AGRICULTURE, MULTIDISCIPLINARY

Artificial Intelligence in Agriculture Pub Date : 2025-05-18 DOI:10.1016/j.aiia.2025.05.002

Shengli Yan , Wenhui Hou , Yuan Rao , Dan Jiang , Xiu Jin , Tan Wang , Yuwei Wang , Lu Liu , Tong Zhang , Arthur Genis

{"title":"Multi-scale cross-modal feature fusion and cost-sensitive loss function for differential detection of occluded bagging pears in practical orchards","authors":"Shengli Yan , Wenhui Hou , Yuan Rao , Dan Jiang , Xiu Jin , Tan Wang , Yuwei Wang , Lu Liu , Tong Zhang , Arthur Genis","doi":"10.1016/j.aiia.2025.05.002","DOIUrl":null,"url":null,"abstract":"<div><div>In practical orchards, the challenges posed by fruit overlapping, branch and leaf occlusion, significantly impede the successful implementation of automated picking, particularly for bagging pears. To address this issue, this paper introduces the multi-scale cross-modal feature fusion and cost-sensitive classification loss function network (MCCNet), specifically designed to accurately detect bagging pears with various occlusion categories. The network designs a dual-stream convolutional neural network as its backbone, enabling the parallel extraction of multi-modal features. Meanwhile, we propose a novel lightweight cross-modal feature fusion method, inspired by enhancing shared features between modalities while extracting specific features from RGB and depth modalities. The cross-modal method enhances the perceptual capabilities of the model by facilitating the fusion of complementary information from multimodal bagging pear image pairs. Furthermore, we optimize the classification loss function by transforming it into a cost-sensitive loss function, aiming to improve detection classification efficiency and reduce instances of missing and false detections during the picking process. Experimental results on a bagging pear dataset demonstrate that our MCCNet achieves mAP0.5 and mAP0.5:0.95 values of 97.3 % and 80.3 %, respectively, representing improvements of 3.6 % and 6.3 % over the classical YOLOv10m model. When benchmarked against several state-of-the-art detection models, our MCCNet network has only 19.5 million parameters while maintaining superior inference speed.</div></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"15 4","pages":"Pages 573-589"},"PeriodicalIF":12.4000,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721725000558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In practical orchards, the challenges posed by fruit overlapping, branch and leaf occlusion, significantly impede the successful implementation of automated picking, particularly for bagging pears. To address this issue, this paper introduces the multi-scale cross-modal feature fusion and cost-sensitive classification loss function network (MCCNet), specifically designed to accurately detect bagging pears with various occlusion categories. The network designs a dual-stream convolutional neural network as its backbone, enabling the parallel extraction of multi-modal features. Meanwhile, we propose a novel lightweight cross-modal feature fusion method, inspired by enhancing shared features between modalities while extracting specific features from RGB and depth modalities. The cross-modal method enhances the perceptual capabilities of the model by facilitating the fusion of complementary information from multimodal bagging pear image pairs. Furthermore, we optimize the classification loss function by transforming it into a cost-sensitive loss function, aiming to improve detection classification efficiency and reduce instances of missing and false detections during the picking process. Experimental results on a bagging pear dataset demonstrate that our MCCNet achieves mAP0.5 and mAP0.5:0.95 values of 97.3 % and 80.3 %, respectively, representing improvements of 3.6 % and 6.3 % over the classical YOLOv10m model. When benchmarked against several state-of-the-art detection models, our MCCNet network has only 19.5 million parameters while maintaining superior inference speed.

查看原文本刊更多论文

多尺度跨模态特征融合与代价敏感损失函数在实际果园闭塞套袋梨鉴别检测中的应用

在实际果园中，果实重叠、枝叶遮挡带来的挑战严重阻碍了自动采摘的成功实施，特别是对梨的装袋。为了解决这一问题，本文引入了多尺度跨模态特征融合和代价敏感分类损失函数网络（MCCNet），专门用于准确检测不同遮挡类别的套袋梨。该网络设计了双流卷积神经网络作为主干，实现了多模态特征的并行提取。同时，我们提出了一种新的轻量级跨模态特征融合方法，该方法的灵感来自增强模态之间的共享特征，同时从RGB和深度模态中提取特定特征。跨模态方法通过促进多模态套袋梨图像对互补信息的融合，增强了模型的感知能力。进一步对分类损失函数进行优化，将其转化为代价敏感的损失函数，以提高检测分类效率，减少拣货过程中的漏检和误检。在梨装袋数据集上的实验结果表明，我们的mcnet模型的mAP0.5和mAP0.5:0.95值分别达到97.3%和80.3%，比经典的YOLOv10m模型分别提高了3.6%和6.3%。当与几个最先进的检测模型进行基准测试时，我们的mcnet网络只有1950万个参数，同时保持了卓越的推理速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊