TranSEF: Transformer Enhanced Self-Ensemble Framework for Damage Assessment in Canola Crops

IEEE Transactions on AgriFood Electronics Pub Date : 2024-12-05 DOI:10.1109/TAFE.2024.3504956

Muhib Ullah;Abdul Bais;Tyler Wist

{"title":"TranSEF: Transformer Enhanced Self-Ensemble Framework for Damage Assessment in Canola Crops","authors":"Muhib Ullah;Abdul Bais;Tyler Wist","doi":"10.1109/TAFE.2024.3504956","DOIUrl":null,"url":null,"abstract":"Crop health monitoring is crucial for implementing timely and effective interventions that ensure sustainability and maximize crop yield. Flea beetles (FB), Crucifer (Phyllotreta cruciferae) and Striped (Phyllotreta striolata), pose a significant threat to canola crop health and cause substantial damage if not addressed promptly. Accurate and timely damage quantification is crucial for implementing targeted pest management strategies if insecticidal seed treatments are overcome by FB feeding to minimize yield losses if the action threshold is exceeded. Traditional manual field monitoring for FB damage is time-consuming and error-prone due to reliance on human visual estimates of FB damage. This article proposes TranSEF, a novel self-ensemble semantic segmentation algorithm that utilizes a hybrid convolutional neural network-vision transformer (ViT) encoder–decoder framework. The encoder employs a modified cross-stage partial DenseNet (CSPDenseNet), MCSPDNet, which enhances attention to tiny regions by aggregating spatially aware features from shallow layers with deeper, more abstract features. ViTs effectively capture the global context in the decoder by modeling long-range dependencies and relationships across the image. Each decoder independently processes inputs from different stages of the MCSPDNet, acting as a weak learner within an ensemble-like approach. Unlike traditional ensemble learning approaches that train weak learners separately, TranSEF is trained end-to-end, making it a self-ensembling framework. TranSEF uses hybrid supervision with a composite loss function, where decoders generate independent predictions and simultaneously supervise each other. TranSEF achieves IoU scores of 0.831 for canola leaves and 0.807 for FB damage, and the overall mIoU improved by 2.29% and 1.56% over DeepLabv3+ and SegFormer, respectively, while utilizing only 35.42 M trainable parameters-significantly fewer than DeepLabv3+ (63 M) and SegFormer (61 M).","PeriodicalId":100637,"journal":{"name":"IEEE Transactions on AgriFood Electronics","volume":"3 1","pages":"179-189"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on AgriFood Electronics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10778982/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Crop health monitoring is crucial for implementing timely and effective interventions that ensure sustainability and maximize crop yield. Flea beetles (FB), Crucifer (Phyllotreta cruciferae) and Striped (Phyllotreta striolata), pose a significant threat to canola crop health and cause substantial damage if not addressed promptly. Accurate and timely damage quantification is crucial for implementing targeted pest management strategies if insecticidal seed treatments are overcome by FB feeding to minimize yield losses if the action threshold is exceeded. Traditional manual field monitoring for FB damage is time-consuming and error-prone due to reliance on human visual estimates of FB damage. This article proposes TranSEF, a novel self-ensemble semantic segmentation algorithm that utilizes a hybrid convolutional neural network-vision transformer (ViT) encoder–decoder framework. The encoder employs a modified cross-stage partial DenseNet (CSPDenseNet), MCSPDNet, which enhances attention to tiny regions by aggregating spatially aware features from shallow layers with deeper, more abstract features. ViTs effectively capture the global context in the decoder by modeling long-range dependencies and relationships across the image. Each decoder independently processes inputs from different stages of the MCSPDNet, acting as a weak learner within an ensemble-like approach. Unlike traditional ensemble learning approaches that train weak learners separately, TranSEF is trained end-to-end, making it a self-ensembling framework. TranSEF uses hybrid supervision with a composite loss function, where decoders generate independent predictions and simultaneously supervise each other. TranSEF achieves IoU scores of 0.831 for canola leaves and 0.807 for FB damage, and the overall mIoU improved by 2.29% and 1.56% over DeepLabv3+ and SegFormer, respectively, while utilizing only 35.42 M trainable parameters-significantly fewer than DeepLabv3+ (63 M) and SegFormer (61 M).

查看原文本刊更多论文

TranSEF：油菜作物危害评估的变压器增强自集成框架

作物健康监测对于实施及时有效的干预措施以确保可持续性和最大限度地提高作物产量至关重要。跳蚤甲虫（FB），十字花科（Phyllotreta cruciferae）和条纹（Phyllotreta striolata），对油菜作物健康构成重大威胁，如果不及时处理，会造成重大损害。如果在超过行动阈值的情况下，通过投喂FB来克服杀虫种子处理，以最大限度地减少产量损失，那么准确和及时的损害量化对于实施有针对性的害虫管理策略至关重要。由于依赖于人类对FB损伤的视觉估计，传统的人工现场监测既耗时又容易出错。本文提出了一种新的自集成语义分割算法TranSEF，该算法利用混合卷积神经网络视觉转换器（ViT）编码器-解码器框架。编码器采用改进的跨阶段部分DenseNet (CSPDenseNet) MCSPDNet，通过将来自浅层的空间感知特征与更深、更抽象的特征聚合在一起，增强对微小区域的关注。vit通过对图像上的长期依赖关系和关系进行建模，有效地捕获解码器中的全局上下文。每个解码器独立处理来自MCSPDNet不同阶段的输入，在类似集成的方法中充当弱学习者。与传统的单独训练弱学习者的集成学习方法不同，TranSEF是端到端训练的，使其成为一个自集成框架。TranSEF使用混合监督和复合损失函数，其中解码器生成独立预测并同时相互监督。TranSEF对油菜叶片的IoU得分为0.831，对FB损伤的IoU得分为0.807，总体mIoU比DeepLabv3+和SegFormer分别提高了2.29%和1.56%，而仅利用了35.42 M可训练参数，明显少于DeepLabv3+ (63 M)和SegFormer （61 M）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on AgriFood Electronics

自引率

0.00%

发文量