Prediction of gully erosion susceptibility through the lens of the SHapley Additive exPlanations (SHAP) method using a stacking ensemble model

IF 8 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Jeongho Han , Jorge A. Guzman , Maria L. Chu
{"title":"Prediction of gully erosion susceptibility through the lens of the SHapley Additive exPlanations (SHAP) method using a stacking ensemble model","authors":"Jeongho Han ,&nbsp;Jorge A. Guzman ,&nbsp;Maria L. Chu","doi":"10.1016/j.jenvman.2025.125478","DOIUrl":null,"url":null,"abstract":"<div><div>This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the “very high” class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.</div></div>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"383 ","pages":"Article 125478"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0301479725014549","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the “very high” class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.

Abstract Image

利用叠加系综模型的SHapley加性解释(SHAP)方法预测沟道侵蚀敏感性
本文建立了一种新的可解释的叠加系综模型,将叠加概化系综方法与SHapley加性解释(SHAP)相结合,以增强对沟道侵蚀敏感性的预测和解释。应用于伊利诺伊州杰斐逊县,我们的方法利用随机森林(RF),梯度增强机(GBM),逻辑回归(LR)和深度神经网络(DNN)作为各种配置的基础和元学习器,产生44种不同的堆叠模型。对比分析表明,在随机选取的200个沟壑点上,基于LiDAR差异观测值对叠置模型进行了较好的预测效果;除3个品种外,其余品种均超过了最佳基础模型(GBM)的最高曲线下面积(AUC)值0.86。结合RF和GBM作为基础模型,LR作为元学习器的LR叠加模型最有效,AUC为0.916。由LR叠加模型得到的沟壑侵蚀敏感性图将33%的农业用地(89,208公顷)划分为“非常高”级,而单独的RF、LR、GBM和DNN模型分别预测为27%、87%、27%和55%。至关重要的是,SHAP分析阐明了特征值的变化如何影响模型行为,同时考虑了基本模型和元学习者之间的特征交互。SHAP将年叶面积指数(LAI)确定为RF和GBM基础模型中最具影响力的特征。此外,在叠加模型的最终决策过程中,与RF基模型相比,突出了GBM模型的重要性。通过提供一种透明的机制来评估不同的特征和模型对最终决策的影响,这种方法可以扩展到更广泛的环境管理和决策背景,促进更明智和负责任的资源分配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Environmental Management
Journal of Environmental Management 环境科学-环境科学
CiteScore
13.70
自引率
5.70%
发文量
2477
审稿时长
84 days
期刊介绍: The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信