A lightweight CNN-Transformer network for pixel-based crop mapping using time-series Sentinel-2 imagery

IF 7.7 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY

Computers and Electronics in Agriculture Pub Date : 2024-08-28 DOI:10.1016/j.compag.2024.109370

{"title":"A lightweight CNN-Transformer network for pixel-based crop mapping using time-series Sentinel-2 imagery","authors":"","doi":"10.1016/j.compag.2024.109370","DOIUrl":null,"url":null,"abstract":"<div><p>Deep learning approaches have provided state-of-the-art performance in crop mapping. Recently, several studies have combined the strengths of two dominant deep learning architectures, Convolutional Neural Networks (CNNs) and Transformers, to classify crops using remote sensing images. Despite their success, many of these models utilize patch-based methods that require extensive data labeling, as each sample contains multiple pixels with corresponding labels. This leads to higher costs in data preparation and processing. Moreover, previous methods rarely considered the impact of missing values caused by clouds and no-observations in remote sensing data. Therefore, this study proposes a lightweight multi-stage CNN-Transformer network (MCTNet) for pixel-based crop mapping using time-series Sentinel-2 imagery. MCTNet consists of several successive modules, each containing a CNN sub-module and a Transformer sub-module to extract important features from the images, respectively. An attention-based learnable positional encoding (ALPE) module is designed in the Transformer sub-module to capture the complex temporal relations in the time-series data with different missing rates. Arkansas and California in the U.S. are selected to evaluate the model. Experimental results show that the MCTNet has a lightweight advantage with the fewest parameters and memory usage while achieving the superior performance compared to eight advanced models. Specifically, MCTNet obtained an overall accuracy (OA) of 0.968, a kappa coefficient (Kappa) of 0.951, and a macro-averaged F1 score (F1) of 0.933 in Arkansas, and an OA of 0.852, a Kappa of 0.806, and an F1 score of 0.829 in California. The results highlight the importance of each component of the model, particularly the ALPE module, which enhanced the Kappa of MCTNet by 4.2% in Arkansas and improved the model’s robustness to missing values in remote sensing data. Additionally, visualization results demonstrated that the features extracted from CNN and Transformer sub-modules are complementary, explaining the effectiveness of the MCTNet.</p></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":null,"pages":null},"PeriodicalIF":7.7000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169924007610","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning approaches have provided state-of-the-art performance in crop mapping. Recently, several studies have combined the strengths of two dominant deep learning architectures, Convolutional Neural Networks (CNNs) and Transformers, to classify crops using remote sensing images. Despite their success, many of these models utilize patch-based methods that require extensive data labeling, as each sample contains multiple pixels with corresponding labels. This leads to higher costs in data preparation and processing. Moreover, previous methods rarely considered the impact of missing values caused by clouds and no-observations in remote sensing data. Therefore, this study proposes a lightweight multi-stage CNN-Transformer network (MCTNet) for pixel-based crop mapping using time-series Sentinel-2 imagery. MCTNet consists of several successive modules, each containing a CNN sub-module and a Transformer sub-module to extract important features from the images, respectively. An attention-based learnable positional encoding (ALPE) module is designed in the Transformer sub-module to capture the complex temporal relations in the time-series data with different missing rates. Arkansas and California in the U.S. are selected to evaluate the model. Experimental results show that the MCTNet has a lightweight advantage with the fewest parameters and memory usage while achieving the superior performance compared to eight advanced models. Specifically, MCTNet obtained an overall accuracy (OA) of 0.968, a kappa coefficient (Kappa) of 0.951, and a macro-averaged F1 score (F1) of 0.933 in Arkansas, and an OA of 0.852, a Kappa of 0.806, and an F1 score of 0.829 in California. The results highlight the importance of each component of the model, particularly the ALPE module, which enhanced the Kappa of MCTNet by 4.2% in Arkansas and improved the model’s robustness to missing values in remote sensing data. Additionally, visualization results demonstrated that the features extracted from CNN and Transformer sub-modules are complementary, explaining the effectiveness of the MCTNet.

查看原文本刊更多论文

利用时间序列 Sentinel-2 图像绘制基于像素的作物地图的轻量级 CNN-Transformer 网络

深度学习方法在农作物绘图方面提供了最先进的性能。最近，有几项研究结合了卷积神经网络（CNN）和变换器这两种主流深度学习架构的优势，利用遥感图像对作物进行分类。尽管这些模型取得了成功，但其中许多都采用了基于斑块的方法，需要进行大量数据标注，因为每个样本都包含多个具有相应标签的像素。这导致数据准备和处理的成本较高。此外，以前的方法很少考虑遥感数据中云层和无观测数据造成的缺失值的影响。因此，本研究提出了一种轻量级多级 CNN 变换器网络（MCTNet），用于利用时间序列 Sentinel-2 图像进行基于像素的作物绘图。MCTNet 由几个连续的模块组成，每个模块包含一个 CNN 子模块和一个变换器子模块，分别用于从图像中提取重要特征。变换器子模块中设计了一个基于注意力的可学习位置编码（ALPE）模块，以捕捉具有不同缺失率的时间序列数据中复杂的时间关系。我们选择了美国的阿肯色州和加利福尼亚州来评估该模型。实验结果表明，与 8 个先进模型相比，MCTNet 具有参数最少、内存用量最少的轻量级优势，同时性能优越。具体而言，MCTNet 在阿肯色州的总体准确率 (OA) 为 0.968，卡帕系数 (Kappa) 为 0.951，宏观平均 F1 得分 (F1) 为 0.933；在加利福尼亚州的 OA 为 0.852，Kappa 为 0.806，F1 得分为 0.829。结果凸显了模型各组成部分的重要性，尤其是 ALPE 模块，它使 MCTNet 在阿肯色州的 Kappa 值提高了 4.2%，并改善了模型对遥感数据缺失值的鲁棒性。此外，可视化结果表明，从 CNN 和 Transformer 子模块中提取的特征是互补的，说明了 MCTNet 的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers and Electronics in Agriculture 工程技术-计算机：跨学科应用

CiteScore

15.30

自引率

14.50%

发文量

800

审稿时长

62 days

期刊介绍： Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.