环境建模中XAI解释的陷阱:对空气质量数据分析中模型偏差的警告

IF 4.6 2区 环境科学与生态学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Souichi Oka , Takuma Yamazaki , Yoshiyasu Takefuji
{"title":"环境建模中XAI解释的陷阱:对空气质量数据分析中模型偏差的警告","authors":"Souichi Oka ,&nbsp;Takuma Yamazaki ,&nbsp;Yoshiyasu Takefuji","doi":"10.1016/j.envsoft.2025.106700","DOIUrl":null,"url":null,"abstract":"<div><div>Jung et al. (2025) achieved high predictive accuracy in interpolating missing ozone data using graph machine learning (ML) and conducted feature importance analysis with explainable AI (XAI). This correspondence acknowledges their significant contribution but discusses the limitations and biases inherent in ML models and XAI methods (e.g., Random Forest/Bootstrap Test, SHapley Additive exPlanations (SHAP)) and their impact on the reliability of derived feature importance. High predictive accuracy does not necessarily guarantee trustworthy interpretation of feature relevance, as evidenced by inconsistent importance rankings across models and XAI techniques. To enhance interpretability and scientific reliability, we advocate a validation strategy integrating ML with rigorous statistical analysis. It combines model-driven insights with statistical measures such as Spearman's rho and Kendall's tau, and information-theoretic metrics like Mutual Information and Total Correlation to capture complex, non-linear dependencies. Such integration improves the robustness of feature importance assessments and supports more reliable interpretations in environmental modeling.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"194 ","pages":"Article 106700"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pitfalls of XAI interpretation in environmental modeling: A warning on model bias in air quality data analysis\",\"authors\":\"Souichi Oka ,&nbsp;Takuma Yamazaki ,&nbsp;Yoshiyasu Takefuji\",\"doi\":\"10.1016/j.envsoft.2025.106700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Jung et al. (2025) achieved high predictive accuracy in interpolating missing ozone data using graph machine learning (ML) and conducted feature importance analysis with explainable AI (XAI). This correspondence acknowledges their significant contribution but discusses the limitations and biases inherent in ML models and XAI methods (e.g., Random Forest/Bootstrap Test, SHapley Additive exPlanations (SHAP)) and their impact on the reliability of derived feature importance. High predictive accuracy does not necessarily guarantee trustworthy interpretation of feature relevance, as evidenced by inconsistent importance rankings across models and XAI techniques. To enhance interpretability and scientific reliability, we advocate a validation strategy integrating ML with rigorous statistical analysis. It combines model-driven insights with statistical measures such as Spearman's rho and Kendall's tau, and information-theoretic metrics like Mutual Information and Total Correlation to capture complex, non-linear dependencies. Such integration improves the robustness of feature importance assessments and supports more reliable interpretations in environmental modeling.</div></div>\",\"PeriodicalId\":310,\"journal\":{\"name\":\"Environmental Modelling & Software\",\"volume\":\"194 \",\"pages\":\"Article 106700\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Modelling & Software\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1364815225003846\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Modelling & Software","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1364815225003846","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

Jung等人(2025)使用图机器学习(ML)在插值缺失的臭氧数据方面实现了很高的预测精度,并使用可解释的人工智能(XAI)进行了特征重要性分析。本文承认他们的重要贡献,但讨论了ML模型和XAI方法固有的局限性和偏见(例如,随机森林/Bootstrap测试,SHapley加性解释(SHAP))及其对衍生特征重要性可靠性的影响。高预测准确性并不一定保证特征相关性的可信解释,正如模型和XAI技术之间不一致的重要性排名所证明的那样。为了提高可解释性和科学可靠性,我们提倡将机器学习与严格的统计分析相结合的验证策略。它将模型驱动的洞察力与统计度量(如Spearman的rho和Kendall的tau)以及信息理论度量(如Mutual Information和Total Correlation)相结合,以捕获复杂的非线性依赖关系。这种集成提高了特征重要性评估的鲁棒性,并支持环境建模中更可靠的解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pitfalls of XAI interpretation in environmental modeling: A warning on model bias in air quality data analysis
Jung et al. (2025) achieved high predictive accuracy in interpolating missing ozone data using graph machine learning (ML) and conducted feature importance analysis with explainable AI (XAI). This correspondence acknowledges their significant contribution but discusses the limitations and biases inherent in ML models and XAI methods (e.g., Random Forest/Bootstrap Test, SHapley Additive exPlanations (SHAP)) and their impact on the reliability of derived feature importance. High predictive accuracy does not necessarily guarantee trustworthy interpretation of feature relevance, as evidenced by inconsistent importance rankings across models and XAI techniques. To enhance interpretability and scientific reliability, we advocate a validation strategy integrating ML with rigorous statistical analysis. It combines model-driven insights with statistical measures such as Spearman's rho and Kendall's tau, and information-theoretic metrics like Mutual Information and Total Correlation to capture complex, non-linear dependencies. Such integration improves the robustness of feature importance assessments and supports more reliable interpretations in environmental modeling.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environmental Modelling & Software
Environmental Modelling & Software 工程技术-工程:环境
CiteScore
9.30
自引率
8.20%
发文量
241
审稿时长
60 days
期刊介绍: Environmental Modelling & Software publishes contributions, in the form of research articles, reviews and short communications, on recent advances in environmental modelling and/or software. The aim is to improve our capacity to represent, understand, predict or manage the behaviour of environmental systems at all practical scales, and to communicate those improvements to a wide scientific and professional audience.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信