A Hybrid Wheat Head Detection model with Incorporated CNN and Transformer

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI:10.23919/MVA57639.2023.10216087

Shou Harada, Xian-Hua Han

{"title":"A Hybrid Wheat Head Detection model with Incorporated CNN and Transformer","authors":"Shou Harada, Xian-Hua Han","doi":"10.23919/MVA57639.2023.10216087","DOIUrl":null,"url":null,"abstract":"Wheat head detection is an important research topic for production estimation and growth management. Motivated by the great advantages of the deep convolution neural networks (DCNNs) in many vision tasks, the deep-learning based methods have dominated the wheat head detection field, and manifest remarkable performance improvement compared with the traditional image processing methods. The existing methods usually divert the proposed detection models for the generic object detection to wheat head detection, and are insuﬃcient in taking account of the specific characteristics of the wheat head images such as large variations due to different growth stages, high density and overlaps. This work exploits a novel hybrid wheat detection model by incorporating the CNN and transformer for modeling long-range dependence. Specifically, we firstly employ a backbone ResNet to extract multi-scale features, and leverage an inter-scale feature fusion module to aggregate coarse-to-fine features together for capturing suﬃcient spatial detail to localize small-size wheat head. Moreover, we propose a novel and eﬃcient transformer block by incorporating the self-attention module in channel direction and the feature feed-forward subnet to explore the interaction among the aggregated multi-scale features. Finally a prediction head produces the centerness and size of wheat heads to obtain a simple anchor-free detection model. Extensive experiments on the Global Wheat Head Detection (GWHD) dataset have demonstrated the superiority of our proposed model over the existing state-of-the-art methods as well as the baseline model.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10216087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Wheat head detection is an important research topic for production estimation and growth management. Motivated by the great advantages of the deep convolution neural networks (DCNNs) in many vision tasks, the deep-learning based methods have dominated the wheat head detection field, and manifest remarkable performance improvement compared with the traditional image processing methods. The existing methods usually divert the proposed detection models for the generic object detection to wheat head detection, and are insuﬃcient in taking account of the specific characteristics of the wheat head images such as large variations due to different growth stages, high density and overlaps. This work exploits a novel hybrid wheat detection model by incorporating the CNN and transformer for modeling long-range dependence. Specifically, we firstly employ a backbone ResNet to extract multi-scale features, and leverage an inter-scale feature fusion module to aggregate coarse-to-fine features together for capturing suﬃcient spatial detail to localize small-size wheat head. Moreover, we propose a novel and eﬃcient transformer block by incorporating the self-attention module in channel direction and the feature feed-forward subnet to explore the interaction among the aggregated multi-scale features. Finally a prediction head produces the centerness and size of wheat heads to obtain a simple anchor-free detection model. Extensive experiments on the Global Wheat Head Detection (GWHD) dataset have demonstrated the superiority of our proposed model over the existing state-of-the-art methods as well as the baseline model.

查看原文本刊更多论文

结合CNN和变压器的杂交小麦抽穗检测模型

小麦抽穗检测是小麦产量估算和生长管理的重要研究课题。由于深度卷积神经网络(deep convolution neural networks, DCNNs)在许多视觉任务中的巨大优势，基于深度学习的方法在麦穗检测领域占据主导地位，与传统的图像处理方法相比，具有显著的性能提升。现有方法通常将所提出的一般目标检测模型转移到麦穗检测上，不足以考虑到麦穗图像不同生长阶段变化大、密度大、重叠等具体特征。本文利用CNN和变压器对小麦的远程依赖关系进行建模，建立了一种新的杂交小麦检测模型。具体而言，我们首先利用骨干ResNet提取多尺度特征，并利用尺度间特征融合模块将粗到细的特征聚合在一起，以捕获足够的空间细节来定位小尺寸麦穗。此外，我们提出了一种新颖高效的变压器模块，将通道方向上的自关注模块与特征前馈子网相结合，探索聚合的多尺度特征之间的相互作用。最后由预测头生成麦穗的中心度和大小，得到一个简单的无锚检测模型。在全球小麦穗检测(GWHD)数据集上进行的大量实验表明，我们提出的模型优于现有的最先进的方法以及基线模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量