Dulce Canha, Sylvain Kubler, Kary Främling, Guy Fagherazzi
{"title":"A Functionally-Grounded Benchmark Framework for XAI Methods: Insights and Foundations from a Systematic Literature Review","authors":"Dulce Canha, Sylvain Kubler, Kary Främling, Guy Fagherazzi","doi":"10.1145/3737445","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence (AI) is transforming industries, offering new opportunities to manage and enhance innovation. However, these advancements bring significant challenges for scientists and businesses, with one of the most critical being the ‘trustworthiness” of AI systems. A key requirement of trustworthiness is <jats:italic>transparency</jats:italic> , closely linked to <jats:italic>explicability</jats:italic> . Consequently, the exponential growth of eXplainable AI (XAI) has led to the development of numerous methods and metrics for explainability. Nevertheless, this has resulted in a lack of standardized and formal definitions for fundamental XAI properties (e.g., what do soundness, completeness, and faithfulness of an explanation entail? How is the stability of an XAI method defined?). This lack of consensus makes it difficult for XAI practitioners to establish a shared foundation, thereby impeding the effective benchmarking of XAI methods. This survey paper addresses these challenges with two primary objectives. First, it systematically reviews and categorizes XAI properties, distinguishing them between human-centered (relying on empirical studies involving explainees) or functionally-grounded (quantitative metrics independent of explainees). Second, it expands this analysis by introducing a hierarchically structured, functionally grounded benchmark framework for XAI methods, providing formal definitions of XAI properties. The framework’s practicality is demonstrated by applying it to two widely used methods: LIME and SHAP.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"5 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3737445","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial Intelligence (AI) is transforming industries, offering new opportunities to manage and enhance innovation. However, these advancements bring significant challenges for scientists and businesses, with one of the most critical being the ‘trustworthiness” of AI systems. A key requirement of trustworthiness is transparency , closely linked to explicability . Consequently, the exponential growth of eXplainable AI (XAI) has led to the development of numerous methods and metrics for explainability. Nevertheless, this has resulted in a lack of standardized and formal definitions for fundamental XAI properties (e.g., what do soundness, completeness, and faithfulness of an explanation entail? How is the stability of an XAI method defined?). This lack of consensus makes it difficult for XAI practitioners to establish a shared foundation, thereby impeding the effective benchmarking of XAI methods. This survey paper addresses these challenges with two primary objectives. First, it systematically reviews and categorizes XAI properties, distinguishing them between human-centered (relying on empirical studies involving explainees) or functionally-grounded (quantitative metrics independent of explainees). Second, it expands this analysis by introducing a hierarchically structured, functionally grounded benchmark framework for XAI methods, providing formal definitions of XAI properties. The framework’s practicality is demonstrated by applying it to two widely used methods: LIME and SHAP.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.