Towards Explainable AI: Interpreting Soil Organic Carbon Prediction Models Using a Learning-Based Explanation Method

IF 3.8 2区农林科学 Q2 SOIL SCIENCE

European Journal of Soil Science Pub Date : 2025-02-24 DOI:10.1111/ejss.70071

Nafiseh Kakhani, Ruhollah Taghizadeh-Mehrjardi, Davoud Omarzadeh, Masahiro Ryo, Uta Heiden, Thomas Scholten

{"title":"Towards Explainable AI: Interpreting Soil Organic Carbon Prediction Models Using a Learning-Based Explanation Method","authors":"Nafiseh Kakhani, Ruhollah Taghizadeh-Mehrjardi, Davoud Omarzadeh, Masahiro Ryo, Uta Heiden, Thomas Scholten","doi":"10.1111/ejss.70071","DOIUrl":null,"url":null,"abstract":"<p>An understanding of the key factors and processes influencing the variability of soil organic carbon (SOC) is essential for the development of effective policies aimed at enhancing carbon storage in soils to mitigate climate change. In recent years, complex computational approaches from the field of machine learning (ML) have been developed for modelling and mapping SOC in various ecosystems and over large areas. However, in order to understand the processes that account for SOC variability from ML models and to serve as a basis for new scientific discoveries, the predictions made by these data-driven models must be accurately explained and interpreted. In this research, we introduce a novel explanation approach applicable to any ML model and investigate the significance of environmental features to explain SOC variability across Germany. The methodology employed in this study involves training multiple ML models using SOC content measurements from the LUCAS dataset and incorporating environmental features derived from Google Earth Engine (GEE) as explanatory variables. Thereafter, an explanation model is applied to elucidate what the ML models have learned about the relationship between environmental features and SOC content in a supervised manner. In our approach, a post hoc model is trained to estimate the contribution of specific inputs to the outputs of the trained ML models. The results of this study indicate that different classes of ML models rely on interpretable but distinct environmental features to explain SOC variability. Decision tree-based models, such as random forest (RF) and gradient boosting, highlight the importance of topographic features. Conversely, soil chemical information, particularly pH, is crucial for the performance of neural networks and linear regression models. Therefore, interpreting data-driven studies requires a carefully structured approach, guided by expert knowledge and a deep understanding of the models being analysed.</p>","PeriodicalId":12043,"journal":{"name":"European Journal of Soil Science","volume":"76 2","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ejss.70071","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Soil Science","FirstCategoryId":"97","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejss.70071","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

An understanding of the key factors and processes influencing the variability of soil organic carbon (SOC) is essential for the development of effective policies aimed at enhancing carbon storage in soils to mitigate climate change. In recent years, complex computational approaches from the field of machine learning (ML) have been developed for modelling and mapping SOC in various ecosystems and over large areas. However, in order to understand the processes that account for SOC variability from ML models and to serve as a basis for new scientific discoveries, the predictions made by these data-driven models must be accurately explained and interpreted. In this research, we introduce a novel explanation approach applicable to any ML model and investigate the significance of environmental features to explain SOC variability across Germany. The methodology employed in this study involves training multiple ML models using SOC content measurements from the LUCAS dataset and incorporating environmental features derived from Google Earth Engine (GEE) as explanatory variables. Thereafter, an explanation model is applied to elucidate what the ML models have learned about the relationship between environmental features and SOC content in a supervised manner. In our approach, a post hoc model is trained to estimate the contribution of specific inputs to the outputs of the trained ML models. The results of this study indicate that different classes of ML models rely on interpretable but distinct environmental features to explain SOC variability. Decision tree-based models, such as random forest (RF) and gradient boosting, highlight the importance of topographic features. Conversely, soil chemical information, particularly pH, is crucial for the performance of neural networks and linear regression models. Therefore, interpreting data-driven studies requires a carefully structured approach, guided by expert knowledge and a deep understanding of the models being analysed.

Abstract Image

查看原文本刊更多论文

迈向可解释的人工智能：使用基于学习的解释方法解释土壤有机碳预测模型

了解影响土壤有机碳（SOC）变异性的关键因素和过程对于制定有效的政策以增强土壤碳储量以减缓气候变化至关重要。近年来，机器学习（ML）领域的复杂计算方法已经被开发出来，用于在各种生态系统和大范围内建模和映射SOC。然而，为了理解从ML模型中解释SOC变化的过程，并作为新科学发现的基础，必须准确解释和解释这些数据驱动模型所做的预测。在这项研究中，我们引入了一种适用于任何ML模型的新型解释方法，并研究了环境特征在解释德国SOC变化方面的重要性。本研究采用的方法包括使用LUCAS数据集的SOC含量测量值来训练多个ML模型，并将谷歌Earth Engine （GEE）衍生的环境特征作为解释变量。然后，应用一个解释模型来阐明机器学习模型以监督的方式了解了环境特征与有机碳含量之间的关系。在我们的方法中，训练一个事后模型来估计特定输入对训练ML模型输出的贡献。本研究的结果表明，不同类别的机器学习模型依赖于可解释但不同的环境特征来解释SOC变化。基于决策树的模型，如随机森林（RF）和梯度增强，强调了地形特征的重要性。相反，土壤化学信息，特别是pH值，对神经网络和线性回归模型的性能至关重要。因此，解释数据驱动的研究需要一种精心构建的方法，以专家知识和对所分析模型的深刻理解为指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Soil Science 农林科学-土壤科学

CiteScore

8.20

自引率

4.80%

发文量

117

审稿时长

5 months

期刊介绍： The EJSS is an international journal that publishes outstanding papers in soil science that advance the theoretical and mechanistic understanding of physical, chemical and biological processes and their interactions in soils acting from molecular to continental scales in natural and managed environments.