Evaluating Explainable Machine Learning Models for Clinicians

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation Pub Date : 2024-05-31 DOI:10.1007/s12559-024-10297-x

Noemi Scarpato, Aria Nourbakhsh, Patrizia Ferroni, Silvia Riondino, Mario Roselli, Francesca Fallucchi, Piero Barbanti, Fiorella Guadagni, Fabio Massimo Zanzotto

{"title":"Evaluating Explainable Machine Learning Models for Clinicians","authors":"Noemi Scarpato, Aria Nourbakhsh, Patrizia Ferroni, Silvia Riondino, Mario Roselli, Francesca Fallucchi, Piero Barbanti, Fiorella Guadagni, Fabio Massimo Zanzotto","doi":"10.1007/s12559-024-10297-x","DOIUrl":null,"url":null,"abstract":"<p>Gaining clinicians’ trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models—a decision tree (DT) and a plain SVM—in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"34 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-024-10297-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Gaining clinicians’ trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models—a decision tree (DT) and a plain SVM—in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.

Abstract Image

查看原文本刊更多论文

为临床医生评估可解释的机器学习模型

赢得临床医生的信任将充分释放人工智能（AI）在医疗领域的潜力，而解释人工智能的决策被视为建立可信系统的途径。然而，医学中的可解释人工智能（XAI）方法往往缺乏适当的评估。在本文中，我们介绍了利用前向可模拟性对 XAI 方法进行评估的方法。我们定义了前向可模拟性评分（FSS），并分析了其在临床预测方面的局限性。然后，我们将 FSS 应用于在 ML-RO 上定义的 XAI 方法，ML-RO 是一种基于多核支持向量机 (SVM) 算法随机优化的机器学习临床预测器。为了比较解释阶段前后的 FSS 值，我们在三个临床数据集（即乳腺癌、VTE 和偏头痛）上测试了 XAI 方法的评估方法。ML-RO 系统是测试我们基于 FSS 的 XAI 评估策略的良好模型。事实上，ML-RO 在三个数据集上的表现优于其他两个基础模型--决策树（DT）和普通 SVM，并为定义不同的 XAI 模型提供了可能性：TOPK、MIGF 和 F4G。FSS 评估得分表明，ML-RO 的解释方法 F4G 在三个测试数据集中的两个数据集中最为有效，同时也显示了所学模型在一个数据集中的局限性。我们的研究旨在为医学领域的 XAI 方法评估引入标准实践。通过建立一个严格的评估框架，我们试图为医疗保健专业人员提供可靠的工具来评估 XAI 方法的性能，从而促进人工智能系统在临床实践中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES

CiteScore

9.30

自引率

3.70%

发文量

116

审稿时长

>12 weeks

期刊介绍： Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.