通过知识蒸馏和注意机制加强杂草检测。

IF 3 Q2 ROBOTICS

Frontiers in Robotics and AI Pub Date : 2025-09-11 eCollection Date: 2025-01-01 DOI:10.3389/frobt.2025.1654074

Ali El Alaoui, Hajar Mousannif

{"title":"通过知识蒸馏和注意机制加强杂草检测。","authors":"Ali El Alaoui, Hajar Mousannif","doi":"10.3389/frobt.2025.1654074","DOIUrl":null,"url":null,"abstract":"Weeds pose a significant challenge in agriculture by competing with crops for essential resources, leading to reduced yields. To address this issue, researchers have increasingly adopted advanced machine learning techniques. Recently, Vision Transformers (ViT) have demonstrated remarkable success in various computer vision tasks, making their application to weed classification, detection, and segmentation more advantageous compared to traditional Convolutional Neural Networks (CNNs) due to their self-attention mechanism. However, the deployment of these models in agricultural robotics is hindered by resource limitations. Key challenges include high training costs, the absence of inductive biases, the extensive volume of data required for training, model size, and runtime memory constraints. This study proposes a knowledge distillation-based method for optimizing the ViT model. The approach aims to enhance the ViT model architecture while maintaining its performance for weed detection. To facilitate the training of the compacted ViT student model and enable parameter sharing and local receptive fields, knowledge was distilled from ResNet-50, which serves as the teacher model. Experimental results demonstrate significant enhancements and improvements in the student model, achieving a mean Average Precision (mAP) of 83.47%. Additionally, the model exhibits minimal computational expense, with only 5.7 million parameters. The proposed knowledge distillation framework successfully addresses the computational constraints associated with ViT deployment in agricultural robotics while preserving detection accuracy for weed detection applications.","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1654074"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460097/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing weed detection through knowledge distillation and attention mechanism.\",\"authors\":\"Ali El Alaoui, Hajar Mousannif\",\"doi\":\"10.3389/frobt.2025.1654074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Weeds pose a significant challenge in agriculture by competing with crops for essential resources, leading to reduced yields. To address this issue, researchers have increasingly adopted advanced machine learning techniques. Recently, Vision Transformers (ViT) have demonstrated remarkable success in various computer vision tasks, making their application to weed classification, detection, and segmentation more advantageous compared to traditional Convolutional Neural Networks (CNNs) due to their self-attention mechanism. However, the deployment of these models in agricultural robotics is hindered by resource limitations. Key challenges include high training costs, the absence of inductive biases, the extensive volume of data required for training, model size, and runtime memory constraints. This study proposes a knowledge distillation-based method for optimizing the ViT model. The approach aims to enhance the ViT model architecture while maintaining its performance for weed detection. To facilitate the training of the compacted ViT student model and enable parameter sharing and local receptive fields, knowledge was distilled from ResNet-50, which serves as the teacher model. Experimental results demonstrate significant enhancements and improvements in the student model, achieving a mean Average Precision (mAP) of 83.47%. Additionally, the model exhibits minimal computational expense, with only 5.7 million parameters. The proposed knowledge distillation framework successfully addresses the computational constraints associated with ViT deployment in agricultural robotics while preserving detection accuracy for weed detection applications.\",\"PeriodicalId\":47597,\"journal\":{\"name\":\"Frontiers in Robotics and AI\",\"volume\":\"12 \",\"pages\":\"1654074\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460097/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Robotics and AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frobt.2025.1654074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2025.1654074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

杂草与农作物争夺重要资源，导致产量下降，对农业构成重大挑战。为了解决这个问题，研究人员越来越多地采用先进的机器学习技术。近年来，视觉变压器（Vision transformer, ViT）在各种计算机视觉任务中取得了显著的成功，由于其自注意机制，与传统的卷积神经网络（Convolutional Neural Networks, cnn）相比，它在杂草分类、检测和分割方面的应用更具优势。然而，这些模型在农业机器人中的部署受到资源限制的阻碍。关键的挑战包括高训练成本、缺乏归纳偏差、训练所需的大量数据、模型大小和运行时内存限制。本文提出了一种基于知识提炼的ViT模型优化方法。该方法旨在增强ViT模型架构，同时保持其杂草检测性能。为了便于压缩ViT学生模型的训练，实现参数共享和局部接受域，从ResNet-50中提取知识，作为教师模型。实验结果表明，学生模型得到了显著的增强和改进，平均平均精度（mAP）达到83.47%。此外，该模型显示最小的计算费用，只有570万个参数。提出的知识蒸馏框架成功地解决了与农业机器人中ViT部署相关的计算约束，同时保持了杂草检测应用的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing weed detection through knowledge distillation and attention mechanism.

Weeds pose a significant challenge in agriculture by competing with crops for essential resources, leading to reduced yields. To address this issue, researchers have increasingly adopted advanced machine learning techniques. Recently, Vision Transformers (ViT) have demonstrated remarkable success in various computer vision tasks, making their application to weed classification, detection, and segmentation more advantageous compared to traditional Convolutional Neural Networks (CNNs) due to their self-attention mechanism. However, the deployment of these models in agricultural robotics is hindered by resource limitations. Key challenges include high training costs, the absence of inductive biases, the extensive volume of data required for training, model size, and runtime memory constraints. This study proposes a knowledge distillation-based method for optimizing the ViT model. The approach aims to enhance the ViT model architecture while maintaining its performance for weed detection. To facilitate the training of the compacted ViT student model and enable parameter sharing and local receptive fields, knowledge was distilled from ResNet-50, which serves as the teacher model. Experimental results demonstrate significant enhancements and improvements in the student model, achieving a mean Average Precision (mAP) of 83.47%. Additionally, the model exhibits minimal computational expense, with only 5.7 million parameters. The proposed knowledge distillation framework successfully addresses the computational constraints associated with ViT deployment in agricultural robotics while preserving detection accuracy for weed detection applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Robotics and AI ROBOTICS-

CiteScore

6.50

自引率

5.90%

发文量

355

审稿时长

14 weeks

期刊介绍： Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.