Interpretable Machine Learning Based on Functional ANOVA Framework: Algorithms and Comparisons

IF 1.3 4区数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Applied Stochastic Models in Business and Industry Pub Date : 2025-01-29 DOI:10.1002/asmb.2916

Linwei Hu, Vijayan N. Nair, Agus Sudjianto, Aijun Zhang, Jie Chen, Zebin Yang

{"title":"Interpretable Machine Learning Based on Functional ANOVA Framework: Algorithms and Comparisons","authors":"Linwei Hu, Vijayan N. Nair, Agus Sudjianto, Aijun Zhang, Jie Chen, Zebin Yang","doi":"10.1002/asmb.2916","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In the early days of machine learning (ML), the emphasis was on developing complex algorithms to achieve best possible predictive performance. To understand and explain the model results, one had to rely on post hoc explainability techniques, which are known to have limitations. Recently, with the recognition in regulated industries that interpretability is also important, researchers are studying algorithms that compromise on small increases in predictive performance in favor of being more interpretable. While doing so, the ML community has rediscovered the use of low-order functional ANOVA (fANOVA) models that have been known in the statistical literature for some time. This paper starts with a description of challenges with post hoc explainability. This is followed by a brief review of the fANOVA framework with a focus on models with just main effects and second-order interactions (called generalized additive models with interactions or GAMI = GAM + Interactions). It then provides an overview of two recently developed GAMI techniques: Explainable Boosting Machines or EBM and GAMI-Net. The paper proposes a new algorithm that also uses trees, as in EBM, but does linear fits instead of piecewise constants within the partitions. We refer to this as GAMI-linear-tree (GAMI-Lin-T). There are many other differences, including the development of a new interaction filtering algorithm. The paper uses simulated and real datasets to compare the three fANOVA ML algorithms. The results show that GAMI-Lin-T and GAMI-Net have comparable performances, and both are generally better than EBM.</p>\n </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Stochastic Models in Business and Industry","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/asmb.2916","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In the early days of machine learning (ML), the emphasis was on developing complex algorithms to achieve best possible predictive performance. To understand and explain the model results, one had to rely on post hoc explainability techniques, which are known to have limitations. Recently, with the recognition in regulated industries that interpretability is also important, researchers are studying algorithms that compromise on small increases in predictive performance in favor of being more interpretable. While doing so, the ML community has rediscovered the use of low-order functional ANOVA (fANOVA) models that have been known in the statistical literature for some time. This paper starts with a description of challenges with post hoc explainability. This is followed by a brief review of the fANOVA framework with a focus on models with just main effects and second-order interactions (called generalized additive models with interactions or GAMI = GAM + Interactions). It then provides an overview of two recently developed GAMI techniques: Explainable Boosting Machines or EBM and GAMI-Net. The paper proposes a new algorithm that also uses trees, as in EBM, but does linear fits instead of piecewise constants within the partitions. We refer to this as GAMI-linear-tree (GAMI-Lin-T). There are many other differences, including the development of a new interaction filtering algorithm. The paper uses simulated and real datasets to compare the three fANOVA ML algorithms. The results show that GAMI-Lin-T and GAMI-Net have comparable performances, and both are generally better than EBM.

查看原文本刊更多论文

基于功能方差分析框架的可解释机器学习：算法和比较

在机器学习（ML）的早期，重点是开发复杂的算法，以实现最佳的预测性能。为了理解和解释模型结果，人们不得不依赖于事后可解释性技术，这是有局限性的。最近，随着受监管行业认识到可解释性也很重要，研究人员正在研究一些算法，这些算法在预测性能的小幅提高上做出妥协，以提高可解释性。在此过程中，ML社区重新发现了在统计文献中已经存在一段时间的低阶功能方差分析（fANOVA）模型的使用。本文首先描述了具有事后可解释性的挑战。接下来是对fANOVA框架的简要回顾，重点是只具有主效应和二阶相互作用的模型（称为具有相互作用的广义加性模型或GAMI = GAM +相互作用）。然后概述了最近开发的两种GAMI技术：可解释的增强机器或EBM和GAMI- net。本文提出了一种新的算法，它也像EBM一样使用树，但在分区内使用线性拟合而不是分段常数。我们称之为gami -线性树（GAMI-Lin-T）。还有许多其他不同之处，包括开发了一种新的交互过滤算法。本文采用模拟数据集和真实数据集对三种fANOVA ML算法进行了比较。结果表明，GAMI-Lin-T和GAMI-Net具有相当的性能，两者总体上优于EBM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Stochastic Models in Business and Industry 数学-数学跨学科应用

CiteScore

2.70

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： ASMBI - Applied Stochastic Models in Business and Industry (formerly Applied Stochastic Models and Data Analysis) was first published in 1985, publishing contributions in the interface between stochastic modelling, data analysis and their applications in business, finance, insurance, management and production. In 2007 ASMBI became the official journal of the International Society for Business and Industrial Statistics (www.isbis.org). The main objective is to publish papers, both technical and practical, presenting new results which solve real-life problems or have great potential in doing so. Mathematical rigour, innovative stochastic modelling and sound applications are the key ingredients of papers to be published, after a very selective review process. The journal is very open to new ideas, like Data Science and Big Data stemming from problems in business and industry or uncertainty quantification in engineering, as well as more traditional ones, like reliability, quality control, design of experiments, managerial processes, supply chains and inventories, insurance, econometrics, financial modelling (provided the papers are related to real problems). The journal is interested also in papers addressing the effects of business and industrial decisions on the environment, healthcare, social life. State-of-the art computational methods are very welcome as well, when combined with sound applications and innovative models.