A Machine Learning Explainability Tutorial for Atmospheric Sciences

Montgomery L. Flora, Corey K. Potvin, Amy McGovern, Shawn Handler
{"title":"A Machine Learning Explainability Tutorial for Atmospheric Sciences","authors":"Montgomery L. Flora, Corey K. Potvin, Amy McGovern, Shawn Handler","doi":"10.1175/aies-d-23-0018.1","DOIUrl":null,"url":null,"abstract":"Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for sub-freezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.","PeriodicalId":94369,"journal":{"name":"Artificial intelligence for the earth systems","volume":" 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence for the earth systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/aies-d-23-0018.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for sub-freezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.
大气科学的机器学习解释性教程
随着人们对解释机器学习(ML)模型的兴趣日益浓厚,本文综合了许多与ML可解释性相关的主题。我们区分了可解释性与可解释性、局部可解释性与全局可解释性、特征重要性与特征相关性。我们演示和可视化不同的解释方法,以及如何解释它们,并提供了一个完整的Python包(scikit-explain),以允许未来的研究人员和模型开发人员探索这些可解释性方法。可解释性方法包括Shapley加性解释(SHAP)、Shapley加性全局解释(SAGE)和累积局部效应(ALE)。我们的重点主要是基于shapley的技术,它作为各种现有方法的统一框架,以增强模型的可解释性。例如,SHAP统一了局部可解释模型不可知解释(LIME)和树解释器等方法来实现局部可解释性,而SAGE统一了排列重要性的不同变化来实现全局可解释性。我们提供了一个简短的教程来解释使用三个不同数据集的ML模型:用于恶劣天气预测的对流模型数据集,用于亚冰冻路面预测的临近预报数据集,以及用于闪电预测的基于卫星的数据。此外,我们还展示了相关特征可能对模型的可解释性产生的不利影响。最后,我们展示了评估特征组而不是单个特征对模型影响的概念。评估特征组可以减轻特征相关性的影响,并且可以提供对模型更全面的理解。本研究中使用的所有代码、模型和数据都是免费提供的,以加速机器学习在大气和其他环境科学中的可解释性的采用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信