Soil sampling design matters - Enhancing the efficiency of digital soil mapping at the field scale

IF 3.1 2区农林科学 Q2 SOIL SCIENCE

Geoderma Regional Pub Date : 2024-10-05 DOI:10.1016/j.geodrs.2024.e00874

Daniel Žížala , Tomáš Princ , Jan Skála , Anna Juřicová , Vojtěch Lukas , Roman Bohovic , Tereza Zádorová , Robert Minařík

{"title":"Soil sampling design matters - Enhancing the efficiency of digital soil mapping at the field scale","authors":"Daniel Žížala , Tomáš Princ , Jan Skála , Anna Juřicová , Vojtěch Lukas , Roman Bohovic , Tereza Zádorová , Robert Minařík","doi":"10.1016/j.geodrs.2024.e00874","DOIUrl":null,"url":null,"abstract":"<div><div>Optimisation of sampling design (methods chosen to select the samples) and sample size (number of samples) remains a key challenge in digital soil mapping, especially in the area of precision farming with the expected economic benefits from the introduction of new technologies. As the existing information is available in the form of relevant environmental covariates, its combination with non-parametric machine learning techniques requires careful planning from the initial field sampling to the final production of digital soil maps. The aim of this study is to compare widely used covariate-wise sampling designs combined with variable sample sizes for supervised prediction of common soil drivers of agricultural productivity (pH, soil organic carbon, soil macronutrients) in a real case study of a field (35 ha) with heterogeneous soil properties. From a total of 200 samples, we evaluated different sample sets where 10, 30 and 60 field samples were selected by conditioned Latin Hypercube Sampling (cLHS) and Feature Space Coverage Sampling (FSCS) to calibrate random forest (RF) models. The evaluation was performed on independently in-situ sampled test points. In addition to these datasets, we also compared the investigated methods with Simple Random Sampling (SRS) in a numerical benchmark experiment with increasing sample size, comparing the global accuracies of the predicted maps on the test points, but using interpolated maps as the artificial true population for each soil characteristic. The results of the study in both the field experiment and the numerical experiment showed slightly better results for the FSCS method, especially when the number of samples was small. At smaller training sample sizes, the risk of insufficiently accurate prediction models was slightly lower for FSCS and the difference decreased as the sample size increased. Nevertheless, sample size proved to be the most important factor in the accuracy of RF models, regardless of the sampling technique. The results suggest that a sample size between 18 and 30 training samples (0.6 to 1 sample ha<sup>−1</sup>) seems plausible for covariate-wise predictions using RF at field scale in our case study. The relative importance of each auxiliary variable for each RF calibration was also assessed for the field experiment. The results showed that the additional introduction of spatial proxies overshadowed the importance of other covariates, but only significantly improved the model calibration at larger sample sizes. The calibrated models without spatial proxies showed the strongest effect of remotely sensed surface characteristics.</div></div>","PeriodicalId":56001,"journal":{"name":"Geoderma Regional","volume":"39 ","pages":"Article e00874"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma Regional","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352009424001214","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Optimisation of sampling design (methods chosen to select the samples) and sample size (number of samples) remains a key challenge in digital soil mapping, especially in the area of precision farming with the expected economic benefits from the introduction of new technologies. As the existing information is available in the form of relevant environmental covariates, its combination with non-parametric machine learning techniques requires careful planning from the initial field sampling to the final production of digital soil maps. The aim of this study is to compare widely used covariate-wise sampling designs combined with variable sample sizes for supervised prediction of common soil drivers of agricultural productivity (pH, soil organic carbon, soil macronutrients) in a real case study of a field (35 ha) with heterogeneous soil properties. From a total of 200 samples, we evaluated different sample sets where 10, 30 and 60 field samples were selected by conditioned Latin Hypercube Sampling (cLHS) and Feature Space Coverage Sampling (FSCS) to calibrate random forest (RF) models. The evaluation was performed on independently in-situ sampled test points. In addition to these datasets, we also compared the investigated methods with Simple Random Sampling (SRS) in a numerical benchmark experiment with increasing sample size, comparing the global accuracies of the predicted maps on the test points, but using interpolated maps as the artificial true population for each soil characteristic. The results of the study in both the field experiment and the numerical experiment showed slightly better results for the FSCS method, especially when the number of samples was small. At smaller training sample sizes, the risk of insufficiently accurate prediction models was slightly lower for FSCS and the difference decreased as the sample size increased. Nevertheless, sample size proved to be the most important factor in the accuracy of RF models, regardless of the sampling technique. The results suggest that a sample size between 18 and 30 training samples (0.6 to 1 sample ha⁻¹) seems plausible for covariate-wise predictions using RF at field scale in our case study. The relative importance of each auxiliary variable for each RF calibration was also assessed for the field experiment. The results showed that the additional introduction of spatial proxies overshadowed the importance of other covariates, but only significantly improved the model calibration at larger sample sizes. The calibrated models without spatial proxies showed the strongest effect of remotely sensed surface characteristics.

Abstract Image

查看原文本刊更多论文

土壤取样设计至关重要--提高实地数字土壤制图的效率

优化取样设计（选择样本的方法）和样本大小（样本数量）仍然是数字土壤制图的一项关键挑战，尤其是在精准农业领域，新技术的引入有望带来经济效益。由于现有信息是以相关环境协变量的形式存在的，因此将其与非参数机器学习技术相结合，需要从最初的田间取样到最终的数字土壤地图制作进行精心规划。本研究的目的是在一个具有异质性土壤特性的田地（35 公顷）的实际案例研究中，比较广泛使用的协变量取样设计与可变样本量相结合，对影响农业生产率的常见土壤驱动因素（pH 值、土壤有机碳、土壤宏量营养元素）进行监督预测。我们从总共 200 个样本中评估了不同的样本集，通过条件拉丁超立方采样（cLHS）和特征空间覆盖采样（FSCS）分别选取了 10、30 和 60 个田间样本，以校准随机森林（RF）模型。评估是在独立的现场采样测试点上进行的。除了这些数据集之外，我们还在一个数值基准实验中将所研究的方法与简单随机抽样（SRS）进行了比较，样本量不断增加，比较了测试点上预测地图的全局精确度，但将内插地图作为每种土壤特性的人工真实群体。现场实验和数值实验的研究结果表明，FSCS 方法的结果稍好，尤其是当样本数量较少时。在训练样本数量较少的情况下，FSCS 预测模型不够准确的风险略低，随着样本数量的增加，差异也在减小。然而，事实证明，无论采用哪种取样技术，样本量都是影响射频模型准确性的最重要因素。结果表明，在我们的案例研究中，18 至 30 个训练样本（0.6 至 1 样本公顷-1）的样本量对于在田间规模使用 RF 进行协变量预测似乎是可行的。在田间试验中，还评估了每个辅助变量对每个 RF 校正的相对重要性。结果表明，额外引入的空间代用指标掩盖了其他协变量的重要性，但只在样本量较大时才显著改善了模型校准。没有空间代用指标的校准模型显示出遥感地表特征的最大影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Geoderma Regional Agricultural and Biological Sciences-Soil Science

CiteScore

6.10

自引率

7.30%

发文量

122

审稿时长

76 days

期刊介绍： Global issues require studies and solutions on national and regional levels. Geoderma Regional focuses on studies that increase understanding and advance our scientific knowledge of soils in all regions of the world. The journal embraces every aspect of soil science and welcomes reviews of regional progress.