Jingzhe Wang , Zipeng Zhang , Yankun Wang , Cheng-Zhi Qin , Xiangyue Chen , Yinghui Zhang , Zhongwen Hu
{"title":"如何解决时间序列土壤有机碳制图中的小样本问题:来自地理第三定律的新见解","authors":"Jingzhe Wang , Zipeng Zhang , Yankun Wang , Cheng-Zhi Qin , Xiangyue Chen , Yinghui Zhang , Zhongwen Hu","doi":"10.1016/j.geoderma.2025.117402","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate and up-to-date mapping of soil organic carbon density (SOCD) spatial distribution and temporal dynamics is essential for understanding terrestrial ecosystem carbon fluxes and monitoring global climate change. However, the available historical soil sample data remained insufficient to meet the high-precision spatiotemporal mapping requirements of SOCD across large regions. Therefore, we attempted to apply the Third Law of Geography (also known as the Law of Geographic Similarity) to address the issue of small sample size in modelling. In this study, we proposed a weighted multivariate similarity index and a similarity threshold index, along with the identification of optimal thresholds for measuring geographic similarity, to effectively increase the soil sample size. Based on the different input samples, we designed various modeling schemes for SOCD mapping. Our results suggest that the geographic similarity threshold-driven framework successfully reconciles the trade-off between sample quantity and quality, increasing sample sizes by up to three times while enhancing spatial representativeness and reducing prediction uncertainty. Accuracy evaluation and uncertainty analysis consistently demonstrated that models incorporating similarity-based input samples outperformed those relying solely on limited local samples. In comparison to the model utilizing only a limited data sample, the S1-1980 s model, achieved a coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span>) of 0.04 and a root mean square error (<span><math><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></math></span>) of 2.47 Kg C m<sup>−2</sup>. Conversely, the S3-1980 s model, based on similarity-expanded samples, demonstrated a significant improvement, achieving an <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> of 0.64 and a <span><math><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></math></span> of 1.36 Kg C m<sup>−2</sup>. Consequently, the prediction using the improved model achieved accurate detection of regional spatiotemporal patterns of SOCD. This study provides a reference for addressing small sample size issues in time-series soil organic carbon mapping.</div></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":"460 ","pages":"Article 117402"},"PeriodicalIF":6.6000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How to solve small sample size problems in time-series soil organic carbon mapping: New insights from the Third Law of Geography\",\"authors\":\"Jingzhe Wang , Zipeng Zhang , Yankun Wang , Cheng-Zhi Qin , Xiangyue Chen , Yinghui Zhang , Zhongwen Hu\",\"doi\":\"10.1016/j.geoderma.2025.117402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate and up-to-date mapping of soil organic carbon density (SOCD) spatial distribution and temporal dynamics is essential for understanding terrestrial ecosystem carbon fluxes and monitoring global climate change. However, the available historical soil sample data remained insufficient to meet the high-precision spatiotemporal mapping requirements of SOCD across large regions. Therefore, we attempted to apply the Third Law of Geography (also known as the Law of Geographic Similarity) to address the issue of small sample size in modelling. In this study, we proposed a weighted multivariate similarity index and a similarity threshold index, along with the identification of optimal thresholds for measuring geographic similarity, to effectively increase the soil sample size. Based on the different input samples, we designed various modeling schemes for SOCD mapping. Our results suggest that the geographic similarity threshold-driven framework successfully reconciles the trade-off between sample quantity and quality, increasing sample sizes by up to three times while enhancing spatial representativeness and reducing prediction uncertainty. Accuracy evaluation and uncertainty analysis consistently demonstrated that models incorporating similarity-based input samples outperformed those relying solely on limited local samples. In comparison to the model utilizing only a limited data sample, the S1-1980 s model, achieved a coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span>) of 0.04 and a root mean square error (<span><math><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></math></span>) of 2.47 Kg C m<sup>−2</sup>. Conversely, the S3-1980 s model, based on similarity-expanded samples, demonstrated a significant improvement, achieving an <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> of 0.64 and a <span><math><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></math></span> of 1.36 Kg C m<sup>−2</sup>. Consequently, the prediction using the improved model achieved accurate detection of regional spatiotemporal patterns of SOCD. This study provides a reference for addressing small sample size issues in time-series soil organic carbon mapping.</div></div>\",\"PeriodicalId\":12511,\"journal\":{\"name\":\"Geoderma\",\"volume\":\"460 \",\"pages\":\"Article 117402\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoderma\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S001670612500240X\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001670612500240X","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
摘要
准确和最新的土壤有机碳密度(SOCD)空间分布和时间动态制图对于了解陆地生态系统碳通量和监测全球气候变化至关重要。然而,现有的历史土壤样品数据仍不足以满足大区域SOCD的高精度时空制图要求。因此,我们试图应用地理第三定律(也称为地理相似定律)来解决建模中样本量小的问题。在本研究中,我们提出了加权多元相似指数和相似阈值指数,并确定了测量地理相似度的最佳阈值,以有效地增加土壤样本量。基于不同的输入样本,我们设计了不同的SOCD映射建模方案。研究结果表明,地理相似性阈值驱动的框架成功地协调了样本数量和质量之间的权衡,在增强空间代表性和降低预测不确定性的同时,将样本量增加了三倍。准确性评估和不确定性分析一致表明,包含基于相似性的输入样本的模型优于仅依赖有限局部样本的模型。与仅使用有限数据样本的模型相比,S1-1980 s模型的决定系数(R2)为0.04,均方根误差(RMSE)为2.47 Kg C m−2。相反,基于相似性扩展样本的S3-1980模型表现出显著的改善,R2为0.64,RMSE为1.36 Kg C m−2。结果表明,基于改进模型的预测能够准确地检测出SOCD的区域时空格局。该研究为解决时间序列土壤有机碳制图中的小样本问题提供了参考。
How to solve small sample size problems in time-series soil organic carbon mapping: New insights from the Third Law of Geography
Accurate and up-to-date mapping of soil organic carbon density (SOCD) spatial distribution and temporal dynamics is essential for understanding terrestrial ecosystem carbon fluxes and monitoring global climate change. However, the available historical soil sample data remained insufficient to meet the high-precision spatiotemporal mapping requirements of SOCD across large regions. Therefore, we attempted to apply the Third Law of Geography (also known as the Law of Geographic Similarity) to address the issue of small sample size in modelling. In this study, we proposed a weighted multivariate similarity index and a similarity threshold index, along with the identification of optimal thresholds for measuring geographic similarity, to effectively increase the soil sample size. Based on the different input samples, we designed various modeling schemes for SOCD mapping. Our results suggest that the geographic similarity threshold-driven framework successfully reconciles the trade-off between sample quantity and quality, increasing sample sizes by up to three times while enhancing spatial representativeness and reducing prediction uncertainty. Accuracy evaluation and uncertainty analysis consistently demonstrated that models incorporating similarity-based input samples outperformed those relying solely on limited local samples. In comparison to the model utilizing only a limited data sample, the S1-1980 s model, achieved a coefficient of determination () of 0.04 and a root mean square error () of 2.47 Kg C m−2. Conversely, the S3-1980 s model, based on similarity-expanded samples, demonstrated a significant improvement, achieving an of 0.64 and a of 1.36 Kg C m−2. Consequently, the prediction using the improved model achieved accurate detection of regional spatiotemporal patterns of SOCD. This study provides a reference for addressing small sample size issues in time-series soil organic carbon mapping.
期刊介绍:
Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.