{"title":"A Hybrid Wheat Head Detection model with Incorporated CNN and Transformer","authors":"Shou Harada, Xian-Hua Han","doi":"10.23919/MVA57639.2023.10216087","DOIUrl":null,"url":null,"abstract":"Wheat head detection is an important research topic for production estimation and growth management. Motivated by the great advantages of the deep convolution neural networks (DCNNs) in many vision tasks, the deep-learning based methods have dominated the wheat head detection field, and manifest remarkable performance improvement compared with the traditional image processing methods. The existing methods usually divert the proposed detection models for the generic object detection to wheat head detection, and are insufficient in taking account of the specific characteristics of the wheat head images such as large variations due to different growth stages, high density and overlaps. This work exploits a novel hybrid wheat detection model by incorporating the CNN and transformer for modeling long-range dependence. Specifically, we firstly employ a backbone ResNet to extract multi-scale features, and leverage an inter-scale feature fusion module to aggregate coarse-to-fine features together for capturing sufficient spatial detail to localize small-size wheat head. Moreover, we propose a novel and efficient transformer block by incorporating the self-attention module in channel direction and the feature feed-forward subnet to explore the interaction among the aggregated multi-scale features. Finally a prediction head produces the centerness and size of wheat heads to obtain a simple anchor-free detection model. Extensive experiments on the Global Wheat Head Detection (GWHD) dataset have demonstrated the superiority of our proposed model over the existing state-of-the-art methods as well as the baseline model.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10216087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Wheat head detection is an important research topic for production estimation and growth management. Motivated by the great advantages of the deep convolution neural networks (DCNNs) in many vision tasks, the deep-learning based methods have dominated the wheat head detection field, and manifest remarkable performance improvement compared with the traditional image processing methods. The existing methods usually divert the proposed detection models for the generic object detection to wheat head detection, and are insufficient in taking account of the specific characteristics of the wheat head images such as large variations due to different growth stages, high density and overlaps. This work exploits a novel hybrid wheat detection model by incorporating the CNN and transformer for modeling long-range dependence. Specifically, we firstly employ a backbone ResNet to extract multi-scale features, and leverage an inter-scale feature fusion module to aggregate coarse-to-fine features together for capturing sufficient spatial detail to localize small-size wheat head. Moreover, we propose a novel and efficient transformer block by incorporating the self-attention module in channel direction and the feature feed-forward subnet to explore the interaction among the aggregated multi-scale features. Finally a prediction head produces the centerness and size of wheat heads to obtain a simple anchor-free detection model. Extensive experiments on the Global Wheat Head Detection (GWHD) dataset have demonstrated the superiority of our proposed model over the existing state-of-the-art methods as well as the baseline model.