Daniel Žížala , Tomáš Princ , Jan Skála , Anna Juřicová , Vojtěch Lukas , Roman Bohovic , Tereza Zádorová , Robert Minařík
{"title":"Soil sampling design matters - Enhancing the efficiency of digital soil mapping at the field scale","authors":"Daniel Žížala , Tomáš Princ , Jan Skála , Anna Juřicová , Vojtěch Lukas , Roman Bohovic , Tereza Zádorová , Robert Minařík","doi":"10.1016/j.geodrs.2024.e00874","DOIUrl":null,"url":null,"abstract":"<div><div>Optimisation of sampling design (methods chosen to select the samples) and sample size (number of samples) remains a key challenge in digital soil mapping, especially in the area of precision farming with the expected economic benefits from the introduction of new technologies. As the existing information is available in the form of relevant environmental covariates, its combination with non-parametric machine learning techniques requires careful planning from the initial field sampling to the final production of digital soil maps. The aim of this study is to compare widely used covariate-wise sampling designs combined with variable sample sizes for supervised prediction of common soil drivers of agricultural productivity (pH, soil organic carbon, soil macronutrients) in a real case study of a field (35 ha) with heterogeneous soil properties. From a total of 200 samples, we evaluated different sample sets where 10, 30 and 60 field samples were selected by conditioned Latin Hypercube Sampling (cLHS) and Feature Space Coverage Sampling (FSCS) to calibrate random forest (RF) models. The evaluation was performed on independently in-situ sampled test points. In addition to these datasets, we also compared the investigated methods with Simple Random Sampling (SRS) in a numerical benchmark experiment with increasing sample size, comparing the global accuracies of the predicted maps on the test points, but using interpolated maps as the artificial true population for each soil characteristic. The results of the study in both the field experiment and the numerical experiment showed slightly better results for the FSCS method, especially when the number of samples was small. At smaller training sample sizes, the risk of insufficiently accurate prediction models was slightly lower for FSCS and the difference decreased as the sample size increased. Nevertheless, sample size proved to be the most important factor in the accuracy of RF models, regardless of the sampling technique. The results suggest that a sample size between 18 and 30 training samples (0.6 to 1 sample ha<sup>−1</sup>) seems plausible for covariate-wise predictions using RF at field scale in our case study. The relative importance of each auxiliary variable for each RF calibration was also assessed for the field experiment. The results showed that the additional introduction of spatial proxies overshadowed the importance of other covariates, but only significantly improved the model calibration at larger sample sizes. The calibrated models without spatial proxies showed the strongest effect of remotely sensed surface characteristics.</div></div>","PeriodicalId":56001,"journal":{"name":"Geoderma Regional","volume":"39 ","pages":"Article e00874"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma Regional","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352009424001214","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Optimisation of sampling design (methods chosen to select the samples) and sample size (number of samples) remains a key challenge in digital soil mapping, especially in the area of precision farming with the expected economic benefits from the introduction of new technologies. As the existing information is available in the form of relevant environmental covariates, its combination with non-parametric machine learning techniques requires careful planning from the initial field sampling to the final production of digital soil maps. The aim of this study is to compare widely used covariate-wise sampling designs combined with variable sample sizes for supervised prediction of common soil drivers of agricultural productivity (pH, soil organic carbon, soil macronutrients) in a real case study of a field (35 ha) with heterogeneous soil properties. From a total of 200 samples, we evaluated different sample sets where 10, 30 and 60 field samples were selected by conditioned Latin Hypercube Sampling (cLHS) and Feature Space Coverage Sampling (FSCS) to calibrate random forest (RF) models. The evaluation was performed on independently in-situ sampled test points. In addition to these datasets, we also compared the investigated methods with Simple Random Sampling (SRS) in a numerical benchmark experiment with increasing sample size, comparing the global accuracies of the predicted maps on the test points, but using interpolated maps as the artificial true population for each soil characteristic. The results of the study in both the field experiment and the numerical experiment showed slightly better results for the FSCS method, especially when the number of samples was small. At smaller training sample sizes, the risk of insufficiently accurate prediction models was slightly lower for FSCS and the difference decreased as the sample size increased. Nevertheless, sample size proved to be the most important factor in the accuracy of RF models, regardless of the sampling technique. The results suggest that a sample size between 18 and 30 training samples (0.6 to 1 sample ha−1) seems plausible for covariate-wise predictions using RF at field scale in our case study. The relative importance of each auxiliary variable for each RF calibration was also assessed for the field experiment. The results showed that the additional introduction of spatial proxies overshadowed the importance of other covariates, but only significantly improved the model calibration at larger sample sizes. The calibrated models without spatial proxies showed the strongest effect of remotely sensed surface characteristics.
期刊介绍:
Global issues require studies and solutions on national and regional levels. Geoderma Regional focuses on studies that increase understanding and advance our scientific knowledge of soils in all regions of the world. The journal embraces every aspect of soil science and welcomes reviews of regional progress.