{"title":"利用时间序列 Sentinel-2 图像绘制基于像素的作物地图的轻量级 CNN-Transformer 网络","authors":"","doi":"10.1016/j.compag.2024.109370","DOIUrl":null,"url":null,"abstract":"<div><p>Deep learning approaches have provided state-of-the-art performance in crop mapping. Recently, several studies have combined the strengths of two dominant deep learning architectures, Convolutional Neural Networks (CNNs) and Transformers, to classify crops using remote sensing images. Despite their success, many of these models utilize patch-based methods that require extensive data labeling, as each sample contains multiple pixels with corresponding labels. This leads to higher costs in data preparation and processing. Moreover, previous methods rarely considered the impact of missing values caused by clouds and no-observations in remote sensing data. Therefore, this study proposes a lightweight multi-stage CNN-Transformer network (MCTNet) for pixel-based crop mapping using time-series Sentinel-2 imagery. MCTNet consists of several successive modules, each containing a CNN sub-module and a Transformer sub-module to extract important features from the images, respectively. An attention-based learnable positional encoding (ALPE) module is designed in the Transformer sub-module to capture the complex temporal relations in the time-series data with different missing rates. Arkansas and California in the U.S. are selected to evaluate the model. Experimental results show that the MCTNet has a lightweight advantage with the fewest parameters and memory usage while achieving the superior performance compared to eight advanced models. Specifically, MCTNet obtained an overall accuracy (OA) of 0.968, a kappa coefficient (Kappa) of 0.951, and a macro-averaged F1 score (F1) of 0.933 in Arkansas, and an OA of 0.852, a Kappa of 0.806, and an F1 score of 0.829 in California. The results highlight the importance of each component of the model, particularly the ALPE module, which enhanced the Kappa of MCTNet by 4.2% in Arkansas and improved the model’s robustness to missing values in remote sensing data. Additionally, visualization results demonstrated that the features extracted from CNN and Transformer sub-modules are complementary, explaining the effectiveness of the MCTNet.</p></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":null,"pages":null},"PeriodicalIF":7.7000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A lightweight CNN-Transformer network for pixel-based crop mapping using time-series Sentinel-2 imagery\",\"authors\":\"\",\"doi\":\"10.1016/j.compag.2024.109370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Deep learning approaches have provided state-of-the-art performance in crop mapping. Recently, several studies have combined the strengths of two dominant deep learning architectures, Convolutional Neural Networks (CNNs) and Transformers, to classify crops using remote sensing images. Despite their success, many of these models utilize patch-based methods that require extensive data labeling, as each sample contains multiple pixels with corresponding labels. This leads to higher costs in data preparation and processing. Moreover, previous methods rarely considered the impact of missing values caused by clouds and no-observations in remote sensing data. Therefore, this study proposes a lightweight multi-stage CNN-Transformer network (MCTNet) for pixel-based crop mapping using time-series Sentinel-2 imagery. MCTNet consists of several successive modules, each containing a CNN sub-module and a Transformer sub-module to extract important features from the images, respectively. An attention-based learnable positional encoding (ALPE) module is designed in the Transformer sub-module to capture the complex temporal relations in the time-series data with different missing rates. Arkansas and California in the U.S. are selected to evaluate the model. Experimental results show that the MCTNet has a lightweight advantage with the fewest parameters and memory usage while achieving the superior performance compared to eight advanced models. Specifically, MCTNet obtained an overall accuracy (OA) of 0.968, a kappa coefficient (Kappa) of 0.951, and a macro-averaged F1 score (F1) of 0.933 in Arkansas, and an OA of 0.852, a Kappa of 0.806, and an F1 score of 0.829 in California. The results highlight the importance of each component of the model, particularly the ALPE module, which enhanced the Kappa of MCTNet by 4.2% in Arkansas and improved the model’s robustness to missing values in remote sensing data. Additionally, visualization results demonstrated that the features extracted from CNN and Transformer sub-modules are complementary, explaining the effectiveness of the MCTNet.</p></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169924007610\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169924007610","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
A lightweight CNN-Transformer network for pixel-based crop mapping using time-series Sentinel-2 imagery
Deep learning approaches have provided state-of-the-art performance in crop mapping. Recently, several studies have combined the strengths of two dominant deep learning architectures, Convolutional Neural Networks (CNNs) and Transformers, to classify crops using remote sensing images. Despite their success, many of these models utilize patch-based methods that require extensive data labeling, as each sample contains multiple pixels with corresponding labels. This leads to higher costs in data preparation and processing. Moreover, previous methods rarely considered the impact of missing values caused by clouds and no-observations in remote sensing data. Therefore, this study proposes a lightweight multi-stage CNN-Transformer network (MCTNet) for pixel-based crop mapping using time-series Sentinel-2 imagery. MCTNet consists of several successive modules, each containing a CNN sub-module and a Transformer sub-module to extract important features from the images, respectively. An attention-based learnable positional encoding (ALPE) module is designed in the Transformer sub-module to capture the complex temporal relations in the time-series data with different missing rates. Arkansas and California in the U.S. are selected to evaluate the model. Experimental results show that the MCTNet has a lightweight advantage with the fewest parameters and memory usage while achieving the superior performance compared to eight advanced models. Specifically, MCTNet obtained an overall accuracy (OA) of 0.968, a kappa coefficient (Kappa) of 0.951, and a macro-averaged F1 score (F1) of 0.933 in Arkansas, and an OA of 0.852, a Kappa of 0.806, and an F1 score of 0.829 in California. The results highlight the importance of each component of the model, particularly the ALPE module, which enhanced the Kappa of MCTNet by 4.2% in Arkansas and improved the model’s robustness to missing values in remote sensing data. Additionally, visualization results demonstrated that the features extracted from CNN and Transformer sub-modules are complementary, explaining the effectiveness of the MCTNet.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.