A Machine Learning Explainability Tutorial for Atmospheric Sciences

Artificial intelligence for the earth systems Pub Date : 2023-11-09 DOI:10.1175/aies-d-23-0018.1

Montgomery L. Flora, Corey K. Potvin, Amy McGovern, Shawn Handler

{"title":"A Machine Learning Explainability Tutorial for Atmospheric Sciences","authors":"Montgomery L. Flora, Corey K. Potvin, Amy McGovern, Shawn Handler","doi":"10.1175/aies-d-23-0018.1","DOIUrl":null,"url":null,"abstract":"Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for sub-freezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.","PeriodicalId":94369,"journal":{"name":"Artificial intelligence for the earth systems","volume":" 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence for the earth systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/aies-d-23-0018.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for sub-freezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.

查看原文本刊更多论文

大气科学的机器学习解释性教程

随着人们对解释机器学习(ML)模型的兴趣日益浓厚，本文综合了许多与ML可解释性相关的主题。我们区分了可解释性与可解释性、局部可解释性与全局可解释性、特征重要性与特征相关性。我们演示和可视化不同的解释方法，以及如何解释它们，并提供了一个完整的Python包(scikit-explain)，以允许未来的研究人员和模型开发人员探索这些可解释性方法。可解释性方法包括Shapley加性解释(SHAP)、Shapley加性全局解释(SAGE)和累积局部效应(ALE)。我们的重点主要是基于shapley的技术，它作为各种现有方法的统一框架，以增强模型的可解释性。例如，SHAP统一了局部可解释模型不可知解释(LIME)和树解释器等方法来实现局部可解释性，而SAGE统一了排列重要性的不同变化来实现全局可解释性。我们提供了一个简短的教程来解释使用三个不同数据集的ML模型:用于恶劣天气预测的对流模型数据集，用于亚冰冻路面预测的临近预报数据集，以及用于闪电预测的基于卫星的数据。此外，我们还展示了相关特征可能对模型的可解释性产生的不利影响。最后，我们展示了评估特征组而不是单个特征对模型影响的概念。评估特征组可以减轻特征相关性的影响，并且可以提供对模型更全面的理解。本研究中使用的所有代码、模型和数据都是免费提供的，以加速机器学习在大气和其他环境科学中的可解释性的采用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial intelligence for the earth systems

自引率

0.00%

发文量