Machine learning models' assessment: trust and performance.

IF 2.6 4区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Medical & Biological Engineering & Computing Pub Date : 2024-11-01 Epub Date: 2024-06-08 DOI:10.1007/s11517-024-03145-5

S Sousa, S Paredes, T Rocha, J Henriques, J Sousa, L Gonçalves

{"title":"Machine learning models' assessment: trust and performance.","authors":"S Sousa, S Paredes, T Rocha, J Henriques, J Sousa, L Gonçalves","doi":"10.1007/s11517-024-03145-5","DOIUrl":null,"url":null,"abstract":"<p><p>The common black box nature of machine learning models is an obstacle to their application in health care context. Their widespread application is limited by a significant \"lack of trust.\" So, the main goal of this work is the development of an evaluation approach that can assess, simultaneously, trust and performance. Trust assessment is based on (i) model robustness (stability assessment), (ii) confidence (95% CI of geometric mean), and (iii) interpretability (comparison of respective features ranking with clinical evidence). Performance is assessed through geometric mean. For validation, in patients' stratification in cardiovascular risk assessment, a Portuguese dataset (N=1544) was applied. Five different models were compared: (i) GRACE score, the most common risk assessment tool in Portugal for patients with acute coronary syndrome; (ii) logistic regression; (iii) Naïve Bayes; (iv) decision trees; and (v) rule-based approach, previously developed by this team. The obtained results confirm that the simultaneous assessment of trust and performance can be successfully implemented. The rule-based approach seems to have potential for clinical application. It provides a high level of trust in the respective operation while outperformed the GRACE model's performance, enhancing the required physicians' acceptance. This may increase the possibility to effectively aid the clinical decision.</p>","PeriodicalId":49840,"journal":{"name":"Medical & Biological Engineering & Computing","volume":" ","pages":"3397-3410"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11485107/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical & Biological Engineering & Computing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11517-024-03145-5","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/8 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The common black box nature of machine learning models is an obstacle to their application in health care context. Their widespread application is limited by a significant "lack of trust." So, the main goal of this work is the development of an evaluation approach that can assess, simultaneously, trust and performance. Trust assessment is based on (i) model robustness (stability assessment), (ii) confidence (95% CI of geometric mean), and (iii) interpretability (comparison of respective features ranking with clinical evidence). Performance is assessed through geometric mean. For validation, in patients' stratification in cardiovascular risk assessment, a Portuguese dataset (N=1544) was applied. Five different models were compared: (i) GRACE score, the most common risk assessment tool in Portugal for patients with acute coronary syndrome; (ii) logistic regression; (iii) Naïve Bayes; (iv) decision trees; and (v) rule-based approach, previously developed by this team. The obtained results confirm that the simultaneous assessment of trust and performance can be successfully implemented. The rule-based approach seems to have potential for clinical application. It provides a high level of trust in the respective operation while outperformed the GRACE model's performance, enhancing the required physicians' acceptance. This may increase the possibility to effectively aid the clinical decision.

Abstract Image

查看原文本刊更多论文

机器学习模型评估：信任与性能。

机器学习模型常见的黑箱性质是其在医疗保健领域应用的一个障碍。它们的广泛应用受到严重的 "信任缺失 "的限制。因此，这项工作的主要目标是开发一种可同时评估信任度和性能的评估方法。信任度评估基于：(i) 模型稳健性（稳定性评估）；(ii) 可信度（几何平均数的 95% CI）；(iii) 可解释性（各自特征排名与临床证据的比较）。通过几何平均数评估性能。为了验证心血管风险评估中的患者分层，应用了葡萄牙数据集（N=1544）。比较了五种不同的模型：(i) GRACE 评分（葡萄牙最常用的急性冠状动脉综合征患者风险评估工具）；(ii) 逻辑回归；(iii) 奈夫贝叶斯；(iv) 决策树；(v) 本团队之前开发的基于规则的方法。所得结果证实，同时评估信任度和绩效的方法可以成功实施。基于规则的方法似乎具有临床应用潜力。它为各自的操作提供了较高的信任度，同时在性能上优于 GRACE 模型，提高了所需医生的接受度。这可能会增加有效辅助临床决策的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical & Biological Engineering & Computing 医学-工程：生物医学

CiteScore

6.00

自引率

3.10%

发文量

249

审稿时长

3.5 months

期刊介绍： Founded in 1963, Medical & Biological Engineering & Computing (MBEC) continues to serve the biomedical engineering community, covering the entire spectrum of biomedical and clinical engineering. The journal presents exciting and vital experimental and theoretical developments in biomedical science and technology, and reports on advances in computer-based methodologies in these multidisciplinary subjects. The journal also incorporates new and evolving technologies including cellular engineering and molecular imaging. MBEC publishes original research articles as well as reviews and technical notes. Its Rapid Communications category focuses on material of immediate value to the readership, while the Controversies section provides a forum to exchange views on selected issues, stimulating a vigorous and informed debate in this exciting and high profile field. MBEC is an official journal of the International Federation of Medical and Biological Engineering (IFMBE).