Yihong Song , Manzhou Li , Zizhe Zhou , Jiahe Zhang , Xiangge Du , Min Dong , Qinhong Jiang , Che Li , Yuantao Hu , Qiulin Yu , Dongmei Wang , Hegan Dong , Shuo Yan
{"title":"一种基于多模态变压器和传感器融合的轻量化苹果病害分割方法","authors":"Yihong Song , Manzhou Li , Zizhe Zhou , Jiahe Zhang , Xiangge Du , Min Dong , Qinhong Jiang , Che Li , Yuantao Hu , Qiulin Yu , Dongmei Wang , Hegan Dong , Shuo Yan","doi":"10.1016/j.compag.2025.110737","DOIUrl":null,"url":null,"abstract":"<div><div>To address the challenges of multimodal data fusion, low deployment efficiency, and inadequate recognition robustness in complex environments for fruit tree disease segmentation and severity classification, a multimodal parallel transformer-based framework was proposed for apple disease recognition and grading. This method integrates image data with multi-dimensional environmental sensor information. An image segmentation preprocessing module was incorporated to enhance lesion region representation, while a cross-scale attention mechanism and a frame-wise diffusion module were introduced to improve robustness under challenging backgrounds. Additionally, pruning, quantization, and knowledge distillation techniques were employed to enable lightweight deployment. Experimental results demonstrated that the full model achieved outstanding performance on apple disease recognition tasks, reaching a precision of 0.98, recall of 0.93, F1-score of 0.95, and accuracy of 0.96, surpassing several state-of-the-art methods including Mask R-CNN, SegFormer, and Swin Transformer. After compression, the model size was reduced to 76.4 MB, and computational complexity decreased to 6.1 G, enabling real-time inference speeds of 25.2 FPS and 39.6 FPS on Jetson Xavier and Orin platforms, respectively. Ablation studies confirmed the performance contributions of the segmentation preprocessing, sensor fusion, and diffusion modules, demonstrating the potential of the proposed framework for deployment in resource-constrained agricultural scenarios.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110737"},"PeriodicalIF":8.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A lightweight method for apple disease segmentation using multimodal transformer and sensor fusion\",\"authors\":\"Yihong Song , Manzhou Li , Zizhe Zhou , Jiahe Zhang , Xiangge Du , Min Dong , Qinhong Jiang , Che Li , Yuantao Hu , Qiulin Yu , Dongmei Wang , Hegan Dong , Shuo Yan\",\"doi\":\"10.1016/j.compag.2025.110737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To address the challenges of multimodal data fusion, low deployment efficiency, and inadequate recognition robustness in complex environments for fruit tree disease segmentation and severity classification, a multimodal parallel transformer-based framework was proposed for apple disease recognition and grading. This method integrates image data with multi-dimensional environmental sensor information. An image segmentation preprocessing module was incorporated to enhance lesion region representation, while a cross-scale attention mechanism and a frame-wise diffusion module were introduced to improve robustness under challenging backgrounds. Additionally, pruning, quantization, and knowledge distillation techniques were employed to enable lightweight deployment. Experimental results demonstrated that the full model achieved outstanding performance on apple disease recognition tasks, reaching a precision of 0.98, recall of 0.93, F1-score of 0.95, and accuracy of 0.96, surpassing several state-of-the-art methods including Mask R-CNN, SegFormer, and Swin Transformer. After compression, the model size was reduced to 76.4 MB, and computational complexity decreased to 6.1 G, enabling real-time inference speeds of 25.2 FPS and 39.6 FPS on Jetson Xavier and Orin platforms, respectively. Ablation studies confirmed the performance contributions of the segmentation preprocessing, sensor fusion, and diffusion modules, demonstrating the potential of the proposed framework for deployment in resource-constrained agricultural scenarios.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"237 \",\"pages\":\"Article 110737\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925008439\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925008439","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
A lightweight method for apple disease segmentation using multimodal transformer and sensor fusion
To address the challenges of multimodal data fusion, low deployment efficiency, and inadequate recognition robustness in complex environments for fruit tree disease segmentation and severity classification, a multimodal parallel transformer-based framework was proposed for apple disease recognition and grading. This method integrates image data with multi-dimensional environmental sensor information. An image segmentation preprocessing module was incorporated to enhance lesion region representation, while a cross-scale attention mechanism and a frame-wise diffusion module were introduced to improve robustness under challenging backgrounds. Additionally, pruning, quantization, and knowledge distillation techniques were employed to enable lightweight deployment. Experimental results demonstrated that the full model achieved outstanding performance on apple disease recognition tasks, reaching a precision of 0.98, recall of 0.93, F1-score of 0.95, and accuracy of 0.96, surpassing several state-of-the-art methods including Mask R-CNN, SegFormer, and Swin Transformer. After compression, the model size was reduced to 76.4 MB, and computational complexity decreased to 6.1 G, enabling real-time inference speeds of 25.2 FPS and 39.6 FPS on Jetson Xavier and Orin platforms, respectively. Ablation studies confirmed the performance contributions of the segmentation preprocessing, sensor fusion, and diffusion modules, demonstrating the potential of the proposed framework for deployment in resource-constrained agricultural scenarios.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.