Linwei Hu, Vijayan N. Nair, Agus Sudjianto, Aijun Zhang, Jie Chen, Zebin Yang
{"title":"Interpretable Machine Learning Based on Functional ANOVA Framework: Algorithms and Comparisons","authors":"Linwei Hu, Vijayan N. Nair, Agus Sudjianto, Aijun Zhang, Jie Chen, Zebin Yang","doi":"10.1002/asmb.2916","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In the early days of machine learning (ML), the emphasis was on developing complex algorithms to achieve best possible predictive performance. To understand and explain the model results, one had to rely on post hoc explainability techniques, which are known to have limitations. Recently, with the recognition in regulated industries that interpretability is also important, researchers are studying algorithms that compromise on small increases in predictive performance in favor of being more interpretable. While doing so, the ML community has rediscovered the use of low-order functional ANOVA (fANOVA) models that have been known in the statistical literature for some time. This paper starts with a description of challenges with post hoc explainability. This is followed by a brief review of the fANOVA framework with a focus on models with just main effects and second-order interactions (called generalized additive models with interactions or GAMI = GAM + Interactions). It then provides an overview of two recently developed GAMI techniques: Explainable Boosting Machines or EBM and GAMI-Net. The paper proposes a new algorithm that also uses trees, as in EBM, but does linear fits instead of piecewise constants within the partitions. We refer to this as GAMI-linear-tree (GAMI-Lin-T). There are many other differences, including the development of a new interaction filtering algorithm. The paper uses simulated and real datasets to compare the three fANOVA ML algorithms. The results show that GAMI-Lin-T and GAMI-Net have comparable performances, and both are generally better than EBM.</p>\n </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Stochastic Models in Business and Industry","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/asmb.2916","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In the early days of machine learning (ML), the emphasis was on developing complex algorithms to achieve best possible predictive performance. To understand and explain the model results, one had to rely on post hoc explainability techniques, which are known to have limitations. Recently, with the recognition in regulated industries that interpretability is also important, researchers are studying algorithms that compromise on small increases in predictive performance in favor of being more interpretable. While doing so, the ML community has rediscovered the use of low-order functional ANOVA (fANOVA) models that have been known in the statistical literature for some time. This paper starts with a description of challenges with post hoc explainability. This is followed by a brief review of the fANOVA framework with a focus on models with just main effects and second-order interactions (called generalized additive models with interactions or GAMI = GAM + Interactions). It then provides an overview of two recently developed GAMI techniques: Explainable Boosting Machines or EBM and GAMI-Net. The paper proposes a new algorithm that also uses trees, as in EBM, but does linear fits instead of piecewise constants within the partitions. We refer to this as GAMI-linear-tree (GAMI-Lin-T). There are many other differences, including the development of a new interaction filtering algorithm. The paper uses simulated and real datasets to compare the three fANOVA ML algorithms. The results show that GAMI-Lin-T and GAMI-Net have comparable performances, and both are generally better than EBM.
期刊介绍:
ASMBI - Applied Stochastic Models in Business and Industry (formerly Applied Stochastic Models and Data Analysis) was first published in 1985, publishing contributions in the interface between stochastic modelling, data analysis and their applications in business, finance, insurance, management and production. In 2007 ASMBI became the official journal of the International Society for Business and Industrial Statistics (www.isbis.org). The main objective is to publish papers, both technical and practical, presenting new results which solve real-life problems or have great potential in doing so. Mathematical rigour, innovative stochastic modelling and sound applications are the key ingredients of papers to be published, after a very selective review process.
The journal is very open to new ideas, like Data Science and Big Data stemming from problems in business and industry or uncertainty quantification in engineering, as well as more traditional ones, like reliability, quality control, design of experiments, managerial processes, supply chains and inventories, insurance, econometrics, financial modelling (provided the papers are related to real problems). The journal is interested also in papers addressing the effects of business and industrial decisions on the environment, healthcare, social life. State-of-the art computational methods are very welcome as well, when combined with sound applications and innovative models.