一种基于多模态变压器和传感器融合的轻量化苹果病害分割方法

IF 8.9 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY

Computers and Electronics in Agriculture Pub Date : 2025-07-19 DOI:10.1016/j.compag.2025.110737

Yihong Song , Manzhou Li , Zizhe Zhou , Jiahe Zhang , Xiangge Du , Min Dong , Qinhong Jiang , Che Li , Yuantao Hu , Qiulin Yu , Dongmei Wang , Hegan Dong , Shuo Yan

{"title":"一种基于多模态变压器和传感器融合的轻量化苹果病害分割方法","authors":"Yihong Song , Manzhou Li , Zizhe Zhou , Jiahe Zhang , Xiangge Du , Min Dong , Qinhong Jiang , Che Li , Yuantao Hu , Qiulin Yu , Dongmei Wang , Hegan Dong , Shuo Yan","doi":"10.1016/j.compag.2025.110737","DOIUrl":null,"url":null,"abstract":"<div><div>To address the challenges of multimodal data fusion, low deployment efficiency, and inadequate recognition robustness in complex environments for fruit tree disease segmentation and severity classification, a multimodal parallel transformer-based framework was proposed for apple disease recognition and grading. This method integrates image data with multi-dimensional environmental sensor information. An image segmentation preprocessing module was incorporated to enhance lesion region representation, while a cross-scale attention mechanism and a frame-wise diffusion module were introduced to improve robustness under challenging backgrounds. Additionally, pruning, quantization, and knowledge distillation techniques were employed to enable lightweight deployment. Experimental results demonstrated that the full model achieved outstanding performance on apple disease recognition tasks, reaching a precision of 0.98, recall of 0.93, F1-score of 0.95, and accuracy of 0.96, surpassing several state-of-the-art methods including Mask R-CNN, SegFormer, and Swin Transformer. After compression, the model size was reduced to 76.4 MB, and computational complexity decreased to 6.1 G, enabling real-time inference speeds of 25.2 FPS and 39.6 FPS on Jetson Xavier and Orin platforms, respectively. Ablation studies confirmed the performance contributions of the segmentation preprocessing, sensor fusion, and diffusion modules, demonstrating the potential of the proposed framework for deployment in resource-constrained agricultural scenarios.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110737"},"PeriodicalIF":8.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A lightweight method for apple disease segmentation using multimodal transformer and sensor fusion\",\"authors\":\"Yihong Song , Manzhou Li , Zizhe Zhou , Jiahe Zhang , Xiangge Du , Min Dong , Qinhong Jiang , Che Li , Yuantao Hu , Qiulin Yu , Dongmei Wang , Hegan Dong , Shuo Yan\",\"doi\":\"10.1016/j.compag.2025.110737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To address the challenges of multimodal data fusion, low deployment efficiency, and inadequate recognition robustness in complex environments for fruit tree disease segmentation and severity classification, a multimodal parallel transformer-based framework was proposed for apple disease recognition and grading. This method integrates image data with multi-dimensional environmental sensor information. An image segmentation preprocessing module was incorporated to enhance lesion region representation, while a cross-scale attention mechanism and a frame-wise diffusion module were introduced to improve robustness under challenging backgrounds. Additionally, pruning, quantization, and knowledge distillation techniques were employed to enable lightweight deployment. Experimental results demonstrated that the full model achieved outstanding performance on apple disease recognition tasks, reaching a precision of 0.98, recall of 0.93, F1-score of 0.95, and accuracy of 0.96, surpassing several state-of-the-art methods including Mask R-CNN, SegFormer, and Swin Transformer. After compression, the model size was reduced to 76.4 MB, and computational complexity decreased to 6.1 G, enabling real-time inference speeds of 25.2 FPS and 39.6 FPS on Jetson Xavier and Orin platforms, respectively. Ablation studies confirmed the performance contributions of the segmentation preprocessing, sensor fusion, and diffusion modules, demonstrating the potential of the proposed framework for deployment in resource-constrained agricultural scenarios.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"237 \",\"pages\":\"Article 110737\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925008439\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925008439","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

针对多模态数据融合、部署效率低以及复杂环境下识别鲁棒性不足等问题，提出了一种基于多模态并联变压器的苹果病害识别分级框架。该方法将图像数据与多维环境传感器信息相结合。引入图像分割预处理模块增强病灶区域表征，引入跨尺度注意机制和帧扩散模块提高具有挑战性背景下的鲁棒性。此外，还使用了修剪、量化和知识蒸馏技术来实现轻量级部署。实验结果表明，完整模型在苹果病害识别任务上取得了优异的表现，准确率为0.98，召回率为0.93，F1-score为0.95，准确率为0.96，超过了Mask R-CNN、SegFormer、Swin Transformer等几种最先进的方法。压缩后，模型大小降至76.4 MB，计算复杂度降至6.1 G，在Jetson Xavier和Orin平台上的实时推理速度分别为25.2 FPS和39.6 FPS。消融研究证实了分割预处理、传感器融合和扩散模块对性能的贡献，证明了该框架在资源受限的农业场景中部署的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A lightweight method for apple disease segmentation using multimodal transformer and sensor fusion

To address the challenges of multimodal data fusion, low deployment efficiency, and inadequate recognition robustness in complex environments for fruit tree disease segmentation and severity classification, a multimodal parallel transformer-based framework was proposed for apple disease recognition and grading. This method integrates image data with multi-dimensional environmental sensor information. An image segmentation preprocessing module was incorporated to enhance lesion region representation, while a cross-scale attention mechanism and a frame-wise diffusion module were introduced to improve robustness under challenging backgrounds. Additionally, pruning, quantization, and knowledge distillation techniques were employed to enable lightweight deployment. Experimental results demonstrated that the full model achieved outstanding performance on apple disease recognition tasks, reaching a precision of 0.98, recall of 0.93, F1-score of 0.95, and accuracy of 0.96, surpassing several state-of-the-art methods including Mask R-CNN, SegFormer, and Swin Transformer. After compression, the model size was reduced to 76.4 MB, and computational complexity decreased to 6.1 G, enabling real-time inference speeds of 25.2 FPS and 39.6 FPS on Jetson Xavier and Orin platforms, respectively. Ablation studies confirmed the performance contributions of the segmentation preprocessing, sensor fusion, and diffusion modules, demonstrating the potential of the proposed framework for deployment in resource-constrained agricultural scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers and Electronics in Agriculture 工程技术-计算机：跨学科应用

CiteScore

15.30

自引率

14.50%

发文量

800

审稿时长

62 days

期刊介绍： Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.