Rui Lu , Ronghua Liao , Ran Meng , Yingchu Hu , Yi Zhao , Yan Guo , Yingfan Zhang , Zhou Shi , Su Ye
{"title":"Strategic sampling for training a semantic segmentation model in operational mapping: Case studies on cropland parcel extraction","authors":"Rui Lu , Ronghua Liao , Ran Meng , Yingchu Hu , Yi Zhao , Yan Guo , Yingfan Zhang , Zhou Shi , Su Ye","doi":"10.1016/j.rse.2025.115034","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic segmentation of remotely sensed images has become increasingly popular for a wide range of natural resource and urban application, yielding promising results. To an operational semantic segmentation mapping project, having more samples generally enables the model to better extract target features, achieving higher accuracies. However, annotating remote sensing image samples for model training is a time-consuming and labor-intensive process. Strategic sampling aims to minimize the efforts in collecting new training samples for a mapping project, which has been not well studied yet for semantic segmentation. To approach this topic, we employed a hybrid way for combining meta-analysis and case studies to investigate the best practices for strategic sampling. Three factors relating to strategic sampling will be investigated: sample size, distribution and transferring methods. We first reviewed 334 recently published papers that adopted semantic segmentation for operational mapping projects to summarize the current status of training sample design from various mapping scenarios. Subsequently, we constructed a large dataset of over 12,000 high-quality annotated image patches for cropland parcel mapping across five study sites, and evaluated various sampling strategies using a baseline segmentation model. We also proposed a novel balanced sampling method, which leveraged patch-based entropy and edge complexity to classify sample diversity. Our findings revealed that (1) both meta-analysis and the case studies suggested that ∼4 % of the total mapping patches were the optimal training sample size under random sampling, i.e., the minimum size to reach accuracy saturation; (2) compared to random sampling, the newly proposed balanced sampling was superior due to its decreasing the required sample size from ∼4 % to 2.5 % of the total patches in mapped areas; (3) sample transfer and model transfer present identical performance for relaxing the average local sample demand from 2.5 % to 0.5 % of total patches, with sample transfer being slightly more accurate than model transfer (Global Total-Classification errors: 0.298 vs 0.308). This study offers a heuristic framework for applying strategic sampling in semantic segmentation, providing valuable practical guidance for implementing deep learning in an operational scenario.</div></div>","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"331 ","pages":"Article 115034"},"PeriodicalIF":11.4000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0034425725004389","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic segmentation of remotely sensed images has become increasingly popular for a wide range of natural resource and urban application, yielding promising results. To an operational semantic segmentation mapping project, having more samples generally enables the model to better extract target features, achieving higher accuracies. However, annotating remote sensing image samples for model training is a time-consuming and labor-intensive process. Strategic sampling aims to minimize the efforts in collecting new training samples for a mapping project, which has been not well studied yet for semantic segmentation. To approach this topic, we employed a hybrid way for combining meta-analysis and case studies to investigate the best practices for strategic sampling. Three factors relating to strategic sampling will be investigated: sample size, distribution and transferring methods. We first reviewed 334 recently published papers that adopted semantic segmentation for operational mapping projects to summarize the current status of training sample design from various mapping scenarios. Subsequently, we constructed a large dataset of over 12,000 high-quality annotated image patches for cropland parcel mapping across five study sites, and evaluated various sampling strategies using a baseline segmentation model. We also proposed a novel balanced sampling method, which leveraged patch-based entropy and edge complexity to classify sample diversity. Our findings revealed that (1) both meta-analysis and the case studies suggested that ∼4 % of the total mapping patches were the optimal training sample size under random sampling, i.e., the minimum size to reach accuracy saturation; (2) compared to random sampling, the newly proposed balanced sampling was superior due to its decreasing the required sample size from ∼4 % to 2.5 % of the total patches in mapped areas; (3) sample transfer and model transfer present identical performance for relaxing the average local sample demand from 2.5 % to 0.5 % of total patches, with sample transfer being slightly more accurate than model transfer (Global Total-Classification errors: 0.298 vs 0.308). This study offers a heuristic framework for applying strategic sampling in semantic segmentation, providing valuable practical guidance for implementing deep learning in an operational scenario.
遥感图像的语义分割在广泛的自然资源和城市应用中越来越受欢迎,并取得了良好的效果。对于一个可操作的语义分割映射项目来说,拥有更多的样本通常可以使模型更好地提取目标特征,达到更高的精度。然而,为模型训练标注遥感图像样本是一个耗时且费力的过程。策略采样的目的是尽量减少为映射项目收集新的训练样本的工作量,这在语义分割方面还没有得到很好的研究。为了接近这个主题,我们采用了一种混合的方式,将荟萃分析和案例研究相结合,以调查战略抽样的最佳实践。将调查与战略抽样有关的三个因素:样本量、分布和转移方法。我们首先回顾了最近发表的334篇将语义分割用于操作映射项目的论文,总结了不同映射场景下训练样本设计的现状。随后,我们构建了一个包含超过12,000个高质量注释图像块的大型数据集,用于五个研究点的农田地块测绘,并使用基线分割模型评估了各种采样策略。我们还提出了一种新的平衡采样方法,利用基于补丁的熵和边缘复杂度对样本多样性进行分类。我们的研究结果表明:(1)meta分析和案例研究都表明,在随机抽样下,总映射补丁的约4%是最佳训练样本大小,即达到精度饱和的最小大小;(2)与随机抽样相比,新提出的平衡抽样将所需的样本量从地图区域总斑块的4%减少到2.5%,具有优越性;(3)样本转移和模型转移在将平均局部样本需求从总斑块的2.5%降低到0.5%方面表现相同,样本转移比模型转移略准确(全球总分类误差:0.298 vs 0.308)。本研究为在语义分割中应用策略采样提供了一个启发式框架,为在操作场景中实现深度学习提供了有价值的实践指导。
期刊介绍:
Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing.
The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques.
RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.