{"title":"A Note on Spurious Correlations and Explainable Machine Learning in Digital Soil Mapping","authors":"Tobias Rentschler, Thomas Scholten","doi":"10.1111/ejss.70172","DOIUrl":null,"url":null,"abstract":"<p>The use of machine learning as a method for knowledge discovery is often critically discussed in soil science and related environmental disciplines. Reviews of the use of machine learning in digital soil mapping identified few studies that incorporated existing soil knowledge of transformation and translocation processes in soils and mechanistic relationships between covariates in the modelling process. Even models trained with predictors that are meaningless from a soil science perspective can have high accuracies. To test and widen this perspective, we expanded the setup of a previous study by Wadoux, Samuel-Rosa, et al. (2020) from one hypothetical case study to a larger set of 668 hypothetical case studies in 334 study areas. We found that the high accuracy of one single model for a specific area was part of a wide range of possible accuracy metrics (concordance correlation coefficient: 0.16–0.91) when applying the same set of meaningless predictors to all study areas. We discuss these spurious correlations in the context of explainable machine learning and highlight how the important elements of model explainability, model input and model output largely depend on discipline-specific domain knowledge. As soil science knowledge is often incorporated implicitly, we argue that the motivation behind covariate selection should be discussed more explicitly to achieve soil science knowledge beyond spatial prediction.</p>","PeriodicalId":12043,"journal":{"name":"European Journal of Soil Science","volume":"76 4","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bsssjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/ejss.70172","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Soil Science","FirstCategoryId":"97","ListUrlMain":"https://bsssjournals.onlinelibrary.wiley.com/doi/10.1111/ejss.70172","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The use of machine learning as a method for knowledge discovery is often critically discussed in soil science and related environmental disciplines. Reviews of the use of machine learning in digital soil mapping identified few studies that incorporated existing soil knowledge of transformation and translocation processes in soils and mechanistic relationships between covariates in the modelling process. Even models trained with predictors that are meaningless from a soil science perspective can have high accuracies. To test and widen this perspective, we expanded the setup of a previous study by Wadoux, Samuel-Rosa, et al. (2020) from one hypothetical case study to a larger set of 668 hypothetical case studies in 334 study areas. We found that the high accuracy of one single model for a specific area was part of a wide range of possible accuracy metrics (concordance correlation coefficient: 0.16–0.91) when applying the same set of meaningless predictors to all study areas. We discuss these spurious correlations in the context of explainable machine learning and highlight how the important elements of model explainability, model input and model output largely depend on discipline-specific domain knowledge. As soil science knowledge is often incorporated implicitly, we argue that the motivation behind covariate selection should be discussed more explicitly to achieve soil science knowledge beyond spatial prediction.
期刊介绍:
The EJSS is an international journal that publishes outstanding papers in soil science that advance the theoretical and mechanistic understanding of physical, chemical and biological processes and their interactions in soils acting from molecular to continental scales in natural and managed environments.