{"title":"TranSEF: Transformer Enhanced Self-Ensemble Framework for Damage Assessment in Canola Crops","authors":"Muhib Ullah;Abdul Bais;Tyler Wist","doi":"10.1109/TAFE.2024.3504956","DOIUrl":null,"url":null,"abstract":"Crop health monitoring is crucial for implementing timely and effective interventions that ensure sustainability and maximize crop yield. Flea beetles (FB), Crucifer (Phyllotreta cruciferae) and Striped (Phyllotreta striolata), pose a significant threat to canola crop health and cause substantial damage if not addressed promptly. Accurate and timely damage quantification is crucial for implementing targeted pest management strategies if insecticidal seed treatments are overcome by FB feeding to minimize yield losses if the action threshold is exceeded. Traditional manual field monitoring for FB damage is time-consuming and error-prone due to reliance on human visual estimates of FB damage. This article proposes TranSEF, a novel self-ensemble semantic segmentation algorithm that utilizes a hybrid convolutional neural network-vision transformer (ViT) encoder–decoder framework. The encoder employs a modified cross-stage partial DenseNet (CSPDenseNet), MCSPDNet, which enhances attention to tiny regions by aggregating spatially aware features from shallow layers with deeper, more abstract features. ViTs effectively capture the global context in the decoder by modeling long-range dependencies and relationships across the image. Each decoder independently processes inputs from different stages of the MCSPDNet, acting as a weak learner within an ensemble-like approach. Unlike traditional ensemble learning approaches that train weak learners separately, TranSEF is trained end-to-end, making it a self-ensembling framework. TranSEF uses hybrid supervision with a composite loss function, where decoders generate independent predictions and simultaneously supervise each other. TranSEF achieves IoU scores of 0.831 for canola leaves and 0.807 for FB damage, and the overall mIoU improved by 2.29% and 1.56% over DeepLabv3+ and SegFormer, respectively, while utilizing only 35.42 M trainable parameters-significantly fewer than DeepLabv3+ (63 M) and SegFormer (61 M).","PeriodicalId":100637,"journal":{"name":"IEEE Transactions on AgriFood Electronics","volume":"3 1","pages":"179-189"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on AgriFood Electronics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10778982/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Crop health monitoring is crucial for implementing timely and effective interventions that ensure sustainability and maximize crop yield. Flea beetles (FB), Crucifer (Phyllotreta cruciferae) and Striped (Phyllotreta striolata), pose a significant threat to canola crop health and cause substantial damage if not addressed promptly. Accurate and timely damage quantification is crucial for implementing targeted pest management strategies if insecticidal seed treatments are overcome by FB feeding to minimize yield losses if the action threshold is exceeded. Traditional manual field monitoring for FB damage is time-consuming and error-prone due to reliance on human visual estimates of FB damage. This article proposes TranSEF, a novel self-ensemble semantic segmentation algorithm that utilizes a hybrid convolutional neural network-vision transformer (ViT) encoder–decoder framework. The encoder employs a modified cross-stage partial DenseNet (CSPDenseNet), MCSPDNet, which enhances attention to tiny regions by aggregating spatially aware features from shallow layers with deeper, more abstract features. ViTs effectively capture the global context in the decoder by modeling long-range dependencies and relationships across the image. Each decoder independently processes inputs from different stages of the MCSPDNet, acting as a weak learner within an ensemble-like approach. Unlike traditional ensemble learning approaches that train weak learners separately, TranSEF is trained end-to-end, making it a self-ensembling framework. TranSEF uses hybrid supervision with a composite loss function, where decoders generate independent predictions and simultaneously supervise each other. TranSEF achieves IoU scores of 0.831 for canola leaves and 0.807 for FB damage, and the overall mIoU improved by 2.29% and 1.56% over DeepLabv3+ and SegFormer, respectively, while utilizing only 35.42 M trainable parameters-significantly fewer than DeepLabv3+ (63 M) and SegFormer (61 M).