{"title":"Towards Explainable AI: Interpreting Soil Organic Carbon Prediction Models Using a Learning-Based Explanation Method","authors":"Nafiseh Kakhani, Ruhollah Taghizadeh-Mehrjardi, Davoud Omarzadeh, Masahiro Ryo, Uta Heiden, Thomas Scholten","doi":"10.1111/ejss.70071","DOIUrl":null,"url":null,"abstract":"<p>An understanding of the key factors and processes influencing the variability of soil organic carbon (SOC) is essential for the development of effective policies aimed at enhancing carbon storage in soils to mitigate climate change. In recent years, complex computational approaches from the field of machine learning (ML) have been developed for modelling and mapping SOC in various ecosystems and over large areas. However, in order to understand the processes that account for SOC variability from ML models and to serve as a basis for new scientific discoveries, the predictions made by these data-driven models must be accurately explained and interpreted. In this research, we introduce a novel explanation approach applicable to any ML model and investigate the significance of environmental features to explain SOC variability across Germany. The methodology employed in this study involves training multiple ML models using SOC content measurements from the LUCAS dataset and incorporating environmental features derived from Google Earth Engine (GEE) as explanatory variables. Thereafter, an explanation model is applied to elucidate what the ML models have learned about the relationship between environmental features and SOC content in a supervised manner. In our approach, a post hoc model is trained to estimate the contribution of specific inputs to the outputs of the trained ML models. The results of this study indicate that different classes of ML models rely on interpretable but distinct environmental features to explain SOC variability. Decision tree-based models, such as random forest (RF) and gradient boosting, highlight the importance of topographic features. Conversely, soil chemical information, particularly pH, is crucial for the performance of neural networks and linear regression models. Therefore, interpreting data-driven studies requires a carefully structured approach, guided by expert knowledge and a deep understanding of the models being analysed.</p>","PeriodicalId":12043,"journal":{"name":"European Journal of Soil Science","volume":"76 2","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ejss.70071","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Soil Science","FirstCategoryId":"97","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejss.70071","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
An understanding of the key factors and processes influencing the variability of soil organic carbon (SOC) is essential for the development of effective policies aimed at enhancing carbon storage in soils to mitigate climate change. In recent years, complex computational approaches from the field of machine learning (ML) have been developed for modelling and mapping SOC in various ecosystems and over large areas. However, in order to understand the processes that account for SOC variability from ML models and to serve as a basis for new scientific discoveries, the predictions made by these data-driven models must be accurately explained and interpreted. In this research, we introduce a novel explanation approach applicable to any ML model and investigate the significance of environmental features to explain SOC variability across Germany. The methodology employed in this study involves training multiple ML models using SOC content measurements from the LUCAS dataset and incorporating environmental features derived from Google Earth Engine (GEE) as explanatory variables. Thereafter, an explanation model is applied to elucidate what the ML models have learned about the relationship between environmental features and SOC content in a supervised manner. In our approach, a post hoc model is trained to estimate the contribution of specific inputs to the outputs of the trained ML models. The results of this study indicate that different classes of ML models rely on interpretable but distinct environmental features to explain SOC variability. Decision tree-based models, such as random forest (RF) and gradient boosting, highlight the importance of topographic features. Conversely, soil chemical information, particularly pH, is crucial for the performance of neural networks and linear regression models. Therefore, interpreting data-driven studies requires a carefully structured approach, guided by expert knowledge and a deep understanding of the models being analysed.
期刊介绍:
The EJSS is an international journal that publishes outstanding papers in soil science that advance the theoretical and mechanistic understanding of physical, chemical and biological processes and their interactions in soils acting from molecular to continental scales in natural and managed environments.