数字土壤制图中的伪相关和可解释机器学习注释

IF 3.8 2区农林科学 Q2 SOIL SCIENCE

European Journal of Soil Science Pub Date : 2025-08-08 DOI:10.1111/ejss.70172

Tobias Rentschler, Thomas Scholten

{"title":"数字土壤制图中的伪相关和可解释机器学习注释","authors":"Tobias Rentschler, Thomas Scholten","doi":"10.1111/ejss.70172","DOIUrl":null,"url":null,"abstract":"<p>The use of machine learning as a method for knowledge discovery is often critically discussed in soil science and related environmental disciplines. Reviews of the use of machine learning in digital soil mapping identified few studies that incorporated existing soil knowledge of transformation and translocation processes in soils and mechanistic relationships between covariates in the modelling process. Even models trained with predictors that are meaningless from a soil science perspective can have high accuracies. To test and widen this perspective, we expanded the setup of a previous study by Wadoux, Samuel-Rosa, et al. (2020) from one hypothetical case study to a larger set of 668 hypothetical case studies in 334 study areas. We found that the high accuracy of one single model for a specific area was part of a wide range of possible accuracy metrics (concordance correlation coefficient: 0.16–0.91) when applying the same set of meaningless predictors to all study areas. We discuss these spurious correlations in the context of explainable machine learning and highlight how the important elements of model explainability, model input and model output largely depend on discipline-specific domain knowledge. As soil science knowledge is often incorporated implicitly, we argue that the motivation behind covariate selection should be discussed more explicitly to achieve soil science knowledge beyond spatial prediction.</p>","PeriodicalId":12043,"journal":{"name":"European Journal of Soil Science","volume":"76 4","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bsssjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/ejss.70172","citationCount":"0","resultStr":"{\"title\":\"A Note on Spurious Correlations and Explainable Machine Learning in Digital Soil Mapping\",\"authors\":\"Tobias Rentschler, Thomas Scholten\",\"doi\":\"10.1111/ejss.70172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The use of machine learning as a method for knowledge discovery is often critically discussed in soil science and related environmental disciplines. Reviews of the use of machine learning in digital soil mapping identified few studies that incorporated existing soil knowledge of transformation and translocation processes in soils and mechanistic relationships between covariates in the modelling process. Even models trained with predictors that are meaningless from a soil science perspective can have high accuracies. To test and widen this perspective, we expanded the setup of a previous study by Wadoux, Samuel-Rosa, et al. (2020) from one hypothetical case study to a larger set of 668 hypothetical case studies in 334 study areas. We found that the high accuracy of one single model for a specific area was part of a wide range of possible accuracy metrics (concordance correlation coefficient: 0.16–0.91) when applying the same set of meaningless predictors to all study areas. We discuss these spurious correlations in the context of explainable machine learning and highlight how the important elements of model explainability, model input and model output largely depend on discipline-specific domain knowledge. As soil science knowledge is often incorporated implicitly, we argue that the motivation behind covariate selection should be discussed more explicitly to achieve soil science knowledge beyond spatial prediction.</p>\",\"PeriodicalId\":12043,\"journal\":{\"name\":\"European Journal of Soil Science\",\"volume\":\"76 4\",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://bsssjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/ejss.70172\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Soil Science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://bsssjournals.onlinelibrary.wiley.com/doi/10.1111/ejss.70172\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Soil Science","FirstCategoryId":"97","ListUrlMain":"https://bsssjournals.onlinelibrary.wiley.com/doi/10.1111/ejss.70172","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

在土壤科学和相关的环境学科中，机器学习作为一种知识发现方法的使用经常被批判性地讨论。对机器学习在数字土壤制图中使用的回顾发现，很少有研究结合了土壤中转化和转运过程的现有土壤知识以及建模过程中协变量之间的机制关系。即使是用从土壤科学的角度来看毫无意义的预测器训练的模型也可以有很高的准确性。为了检验和扩大这一观点，我们将Wadoux， Samuel-Rosa等人（2020）先前研究的设置从一个假设案例研究扩展到334个研究领域的668个假设案例研究。我们发现，当将同一组无意义的预测因子应用于所有研究区域时，单个模型对特定区域的高精度是广泛可能的精度度量（一致性相关系数：0.16-0.91）的一部分。我们在可解释机器学习的背景下讨论这些虚假的相关性，并强调模型可解释性、模型输入和模型输出的重要元素在很大程度上取决于特定学科的领域知识。由于土壤科学知识通常是隐含的，我们认为应该更明确地讨论协变量选择背后的动机，以实现超越空间预测的土壤科学知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A Note on Spurious Correlations and Explainable Machine Learning in Digital Soil Mapping

查看原文本刊更多论文

A Note on Spurious Correlations and Explainable Machine Learning in Digital Soil Mapping

The use of machine learning as a method for knowledge discovery is often critically discussed in soil science and related environmental disciplines. Reviews of the use of machine learning in digital soil mapping identified few studies that incorporated existing soil knowledge of transformation and translocation processes in soils and mechanistic relationships between covariates in the modelling process. Even models trained with predictors that are meaningless from a soil science perspective can have high accuracies. To test and widen this perspective, we expanded the setup of a previous study by Wadoux, Samuel-Rosa, et al. (2020) from one hypothetical case study to a larger set of 668 hypothetical case studies in 334 study areas. We found that the high accuracy of one single model for a specific area was part of a wide range of possible accuracy metrics (concordance correlation coefficient: 0.16–0.91) when applying the same set of meaningless predictors to all study areas. We discuss these spurious correlations in the context of explainable machine learning and highlight how the important elements of model explainability, model input and model output largely depend on discipline-specific domain knowledge. As soil science knowledge is often incorporated implicitly, we argue that the motivation behind covariate selection should be discussed more explicitly to achieve soil science knowledge beyond spatial prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Journal of Soil Science 农林科学-土壤科学

CiteScore

8.20

自引率

4.80%

发文量

117

审稿时长

5 months

期刊介绍： The EJSS is an international journal that publishes outstanding papers in soil science that advance the theoretical and mechanistic understanding of physical, chemical and biological processes and their interactions in soils acting from molecular to continental scales in natural and managed environments.