Wei Han , Shaohao Chen , Shuanglin Xiao , Yunliang Chen , Huihui Zhao , Jining Yan , Xiaohan Zhang , Sheng Wang
{"title":"通过非常高分辨率的无人机基准和ConvFlow变压器进行大规模烟草识别","authors":"Wei Han , Shaohao Chen , Shuanglin Xiao , Yunliang Chen , Huihui Zhao , Jining Yan , Xiaohan Zhang , Sheng Wang","doi":"10.1016/j.jag.2025.104549","DOIUrl":null,"url":null,"abstract":"<div><div>Remote sensing and artificial intelligence technology have propelled the development of precision agriculture and smart agriculture. Among them, as a crucial economic crop, tobacco has been rarely studied and its large-scale identification task has consistently encountered several challenges. Firstly, tobacco is often inter-cropped with other crops, such as corn. These crops have similar colors and textures, with only minor differences in planting spacing and arrangement. These slight differences become even less observable in remote sensing imagery. Secondly, tobacco growth is a continuous and evolving process, resulting in drastically different characteristics during various growth stages and seasons, which further complicates the task of identification. Moreover, to the best of our knowledge, no tobacco dataset is accessible to the public, impeding the development of a deep learning (DL) model with optimal performance. Therefore, a Large-scale UAV remote SEnsing Tobacco dataset (LUSET) which is the world’s first tobacco dataset with a total volume of 67GB has been conducted in this paper. 10 large-scale images in the LUSET are accurately annotated with an average resolution of about 20,000 × 20,000 pixels, which can be divided into 7,252 512 × 512 samples<span><span><sup>1</sup></span></span>. Then, a dual-branch ConvFlow Transformer is proposed to address tobacco’s rich diversity and high inter-class similarity among different crops. A novel Convolutional Feature-enhanced Multi-Head Self-attention (CF-MHSA) with a location-free design in the ConvFlow Transformer is developed to replace the value matrix in the standard attention with the convolutional multi-scale features, which effectively achieves feature interaction and fusion from the convolutional and transformer branches. The fusion of refined features allows us to better distinguish the texture characteristics of different crops and represent their morphological features during different growth cycles. This addresses the two major challenges in tobacco recognition. Extensive experiments on the UAV tobacco data proved that the strategy of ConvFlow Transformer can be easily achieved in the mainstream Transformers and significantly improve their performance in tobacco identification with a small amount of computation.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"139 ","pages":"Article 104549"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large-scale tobacco identification via a very-high-resolution unmanned aerial vehicle benchmark and a ConvFlow Transformer\",\"authors\":\"Wei Han , Shaohao Chen , Shuanglin Xiao , Yunliang Chen , Huihui Zhao , Jining Yan , Xiaohan Zhang , Sheng Wang\",\"doi\":\"10.1016/j.jag.2025.104549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Remote sensing and artificial intelligence technology have propelled the development of precision agriculture and smart agriculture. Among them, as a crucial economic crop, tobacco has been rarely studied and its large-scale identification task has consistently encountered several challenges. Firstly, tobacco is often inter-cropped with other crops, such as corn. These crops have similar colors and textures, with only minor differences in planting spacing and arrangement. These slight differences become even less observable in remote sensing imagery. Secondly, tobacco growth is a continuous and evolving process, resulting in drastically different characteristics during various growth stages and seasons, which further complicates the task of identification. Moreover, to the best of our knowledge, no tobacco dataset is accessible to the public, impeding the development of a deep learning (DL) model with optimal performance. Therefore, a Large-scale UAV remote SEnsing Tobacco dataset (LUSET) which is the world’s first tobacco dataset with a total volume of 67GB has been conducted in this paper. 10 large-scale images in the LUSET are accurately annotated with an average resolution of about 20,000 × 20,000 pixels, which can be divided into 7,252 512 × 512 samples<span><span><sup>1</sup></span></span>. Then, a dual-branch ConvFlow Transformer is proposed to address tobacco’s rich diversity and high inter-class similarity among different crops. A novel Convolutional Feature-enhanced Multi-Head Self-attention (CF-MHSA) with a location-free design in the ConvFlow Transformer is developed to replace the value matrix in the standard attention with the convolutional multi-scale features, which effectively achieves feature interaction and fusion from the convolutional and transformer branches. The fusion of refined features allows us to better distinguish the texture characteristics of different crops and represent their morphological features during different growth cycles. This addresses the two major challenges in tobacco recognition. Extensive experiments on the UAV tobacco data proved that the strategy of ConvFlow Transformer can be easily achieved in the mainstream Transformers and significantly improve their performance in tobacco identification with a small amount of computation.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"139 \",\"pages\":\"Article 104549\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225001967\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225001967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
Large-scale tobacco identification via a very-high-resolution unmanned aerial vehicle benchmark and a ConvFlow Transformer
Remote sensing and artificial intelligence technology have propelled the development of precision agriculture and smart agriculture. Among them, as a crucial economic crop, tobacco has been rarely studied and its large-scale identification task has consistently encountered several challenges. Firstly, tobacco is often inter-cropped with other crops, such as corn. These crops have similar colors and textures, with only minor differences in planting spacing and arrangement. These slight differences become even less observable in remote sensing imagery. Secondly, tobacco growth is a continuous and evolving process, resulting in drastically different characteristics during various growth stages and seasons, which further complicates the task of identification. Moreover, to the best of our knowledge, no tobacco dataset is accessible to the public, impeding the development of a deep learning (DL) model with optimal performance. Therefore, a Large-scale UAV remote SEnsing Tobacco dataset (LUSET) which is the world’s first tobacco dataset with a total volume of 67GB has been conducted in this paper. 10 large-scale images in the LUSET are accurately annotated with an average resolution of about 20,000 × 20,000 pixels, which can be divided into 7,252 512 × 512 samples1. Then, a dual-branch ConvFlow Transformer is proposed to address tobacco’s rich diversity and high inter-class similarity among different crops. A novel Convolutional Feature-enhanced Multi-Head Self-attention (CF-MHSA) with a location-free design in the ConvFlow Transformer is developed to replace the value matrix in the standard attention with the convolutional multi-scale features, which effectively achieves feature interaction and fusion from the convolutional and transformer branches. The fusion of refined features allows us to better distinguish the texture characteristics of different crops and represent their morphological features during different growth cycles. This addresses the two major challenges in tobacco recognition. Extensive experiments on the UAV tobacco data proved that the strategy of ConvFlow Transformer can be easily achieved in the mainstream Transformers and significantly improve their performance in tobacco identification with a small amount of computation.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.