{"title":"Spatial prediction of organic carbon in German agricultural topsoil using machine learning algorithms","authors":"Ali Sakhaee, Anika Gebauer, Mareike Ließ, A. Don","doi":"10.5194/soil-8-587-2022","DOIUrl":null,"url":null,"abstract":"Abstract. As the largest terrestrial carbon pool, soil organic carbon (SOC) has the\npotential to influence and mitigate climate change; thus, SOC monitoring is of high importance\nin the frameworks of various international treaties. Therefore, high-resolution SOC maps are required. Machine learning (ML) offers new\nopportunities to develop these maps due to its ability to data mine large\ndatasets. The aim of this study was to apply three algorithms commonly used\nin digital soil mapping – random forest (RF), boosted regression trees\n(BRT), and support vector machine for regression (SVR) – on the first German\nagricultural soil inventory to model the agricultural topsoil (0–30 cm) SOC\ncontent and develop a two-model approach to address the high variability in\nSOC in German agricultural soils. Model performance is often limited by the\nsize and quality of the soil dataset available for calibration and\nvalidation. Therefore, the impact of enlarging the training dataset was tested\nby including data from the European Land Use/Cover Area frame Survey\nfor agricultural sites in Germany. Nested cross-validation was implemented\nfor model evaluation and parameter tuning. Grid search and the differential\nevolution algorithm were also applied to ensure that each algorithm was\nappropriately tuned . The SOC content of the German agricultural soil\ninventory was highly variable, ranging from 4 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The\nresults showed that SVR produced the best performance, with a root-mean-square error (RMSE) of 32 g kg−1 when the algorithms were trained on the full dataset. However, the\naverage RMSE of all algorithms decreased by 34 % when mineral and organic\nsoils were modelled separately, with the best result from SVR presenting an RMSE of\n21 g kg−1. The model performance was enhanced by up to 1 % for\nmineral soils and by up to 2 % for organic soils. Despite the ability of machine\nlearning algorithms, in general, and SVR, in particular, to model SOC on a\nnational scale, the study showed that the most important aspect for\nimproving the model performance was to separate the modelling of mineral and\norganic soils.\n","PeriodicalId":22015,"journal":{"name":"Soil Science","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.5194/soil-8-587-2022","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 7
Abstract
Abstract. As the largest terrestrial carbon pool, soil organic carbon (SOC) has the
potential to influence and mitigate climate change; thus, SOC monitoring is of high importance
in the frameworks of various international treaties. Therefore, high-resolution SOC maps are required. Machine learning (ML) offers new
opportunities to develop these maps due to its ability to data mine large
datasets. The aim of this study was to apply three algorithms commonly used
in digital soil mapping – random forest (RF), boosted regression trees
(BRT), and support vector machine for regression (SVR) – on the first German
agricultural soil inventory to model the agricultural topsoil (0–30 cm) SOC
content and develop a two-model approach to address the high variability in
SOC in German agricultural soils. Model performance is often limited by the
size and quality of the soil dataset available for calibration and
validation. Therefore, the impact of enlarging the training dataset was tested
by including data from the European Land Use/Cover Area frame Survey
for agricultural sites in Germany. Nested cross-validation was implemented
for model evaluation and parameter tuning. Grid search and the differential
evolution algorithm were also applied to ensure that each algorithm was
appropriately tuned . The SOC content of the German agricultural soil
inventory was highly variable, ranging from 4 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The
results showed that SVR produced the best performance, with a root-mean-square error (RMSE) of 32 g kg−1 when the algorithms were trained on the full dataset. However, the
average RMSE of all algorithms decreased by 34 % when mineral and organic
soils were modelled separately, with the best result from SVR presenting an RMSE of
21 g kg−1. The model performance was enhanced by up to 1 % for
mineral soils and by up to 2 % for organic soils. Despite the ability of machine
learning algorithms, in general, and SVR, in particular, to model SOC on a
national scale, the study showed that the most important aspect for
improving the model performance was to separate the modelling of mineral and
organic soils.
摘要土壤有机碳(SOC)作为最大的陆地碳库,具有影响和减缓气候变化的潜力;因此,SOC监测在各种国际条约框架中具有重要意义。因此,需要高分辨率的SOC图。机器学习(ML)为开发这些地图提供了新的机会,因为它能够对大型数据集进行数据挖掘。本研究的目的是将数字土壤测绘中常用的三种算法——随机森林(RF)、增强回归树(BRT)和回归支持向量机(SVR)——应用于第一次德国农业土壤调查,以模拟农业表土(0-30 cm)的soc含量,并开发一种双模型方法来解决德国农业土壤中soc的高变异性。模型性能通常受到可用于校准和验证的土壤数据集的大小和质量的限制。因此,对扩大训练数据集的影响进行了测试,包括来自德国农业用地的欧洲土地利用/覆盖面积框架调查的数据。嵌套交叉验证用于模型评估和参数调整。网格搜索和差分进化算法也被应用,以确保每个算法都是适当的调整。德国农业土壤的有机碳含量变化很大,在4 ~ 480 g kg−1之间。然而,所有土壤中只有4%的土壤有机碳含量超过87 g kg - 1,被认为是有机或退化有机土壤。结果表明,当算法在完整数据集上训练时,SVR产生了最好的性能,均方根误差(RMSE)为32 g kg−1。然而,当矿物和有机土壤分别建模时,所有算法的平均RMSE降低了34%,SVR的最佳结果显示RMSE为21 g kg - 1。模型性能在矿质土壤中提高了1%,在有机土壤中提高了2%。尽管机器学习算法(尤其是SVR)能够在全国范围内模拟土壤有机碳,但研究表明,提高模型性能的最重要方面是将矿物土壤和有机土壤的建模分离开来。
期刊介绍:
Cessation.Soil Science satisfies the professional needs of all scientists and laboratory personnel involved in soil and plant research by publishing primary research reports and critical reviews of basic and applied soil science, especially as it relates to soil and plant studies and general environmental soil science.
Each month, Soil Science presents authoritative research articles from an impressive array of discipline: soil chemistry and biochemistry, physics, fertility and nutrition, soil genesis and morphology, soil microbiology and mineralogy. Of immediate relevance to soil scientists-both industrial and academic-this unique publication also has long-range value for agronomists and environmental scientists.