S Sousa, S Paredes, T Rocha, J Henriques, J Sousa, L Gonçalves
{"title":"机器学习模型评估:信任与性能。","authors":"S Sousa, S Paredes, T Rocha, J Henriques, J Sousa, L Gonçalves","doi":"10.1007/s11517-024-03145-5","DOIUrl":null,"url":null,"abstract":"<p><p>The common black box nature of machine learning models is an obstacle to their application in health care context. Their widespread application is limited by a significant \"lack of trust.\" So, the main goal of this work is the development of an evaluation approach that can assess, simultaneously, trust and performance. Trust assessment is based on (i) model robustness (stability assessment), (ii) confidence (95% CI of geometric mean), and (iii) interpretability (comparison of respective features ranking with clinical evidence). Performance is assessed through geometric mean. For validation, in patients' stratification in cardiovascular risk assessment, a Portuguese dataset (N=1544) was applied. Five different models were compared: (i) GRACE score, the most common risk assessment tool in Portugal for patients with acute coronary syndrome; (ii) logistic regression; (iii) Naïve Bayes; (iv) decision trees; and (v) rule-based approach, previously developed by this team. The obtained results confirm that the simultaneous assessment of trust and performance can be successfully implemented. The rule-based approach seems to have potential for clinical application. It provides a high level of trust in the respective operation while outperformed the GRACE model's performance, enhancing the required physicians' acceptance. This may increase the possibility to effectively aid the clinical decision.</p>","PeriodicalId":49840,"journal":{"name":"Medical & Biological Engineering & Computing","volume":" ","pages":"3397-3410"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11485107/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning models' assessment: trust and performance.\",\"authors\":\"S Sousa, S Paredes, T Rocha, J Henriques, J Sousa, L Gonçalves\",\"doi\":\"10.1007/s11517-024-03145-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The common black box nature of machine learning models is an obstacle to their application in health care context. Their widespread application is limited by a significant \\\"lack of trust.\\\" So, the main goal of this work is the development of an evaluation approach that can assess, simultaneously, trust and performance. Trust assessment is based on (i) model robustness (stability assessment), (ii) confidence (95% CI of geometric mean), and (iii) interpretability (comparison of respective features ranking with clinical evidence). Performance is assessed through geometric mean. For validation, in patients' stratification in cardiovascular risk assessment, a Portuguese dataset (N=1544) was applied. Five different models were compared: (i) GRACE score, the most common risk assessment tool in Portugal for patients with acute coronary syndrome; (ii) logistic regression; (iii) Naïve Bayes; (iv) decision trees; and (v) rule-based approach, previously developed by this team. The obtained results confirm that the simultaneous assessment of trust and performance can be successfully implemented. The rule-based approach seems to have potential for clinical application. It provides a high level of trust in the respective operation while outperformed the GRACE model's performance, enhancing the required physicians' acceptance. This may increase the possibility to effectively aid the clinical decision.</p>\",\"PeriodicalId\":49840,\"journal\":{\"name\":\"Medical & Biological Engineering & Computing\",\"volume\":\" \",\"pages\":\"3397-3410\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11485107/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical & Biological Engineering & Computing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11517-024-03145-5\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical & Biological Engineering & Computing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11517-024-03145-5","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/8 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Machine learning models' assessment: trust and performance.
The common black box nature of machine learning models is an obstacle to their application in health care context. Their widespread application is limited by a significant "lack of trust." So, the main goal of this work is the development of an evaluation approach that can assess, simultaneously, trust and performance. Trust assessment is based on (i) model robustness (stability assessment), (ii) confidence (95% CI of geometric mean), and (iii) interpretability (comparison of respective features ranking with clinical evidence). Performance is assessed through geometric mean. For validation, in patients' stratification in cardiovascular risk assessment, a Portuguese dataset (N=1544) was applied. Five different models were compared: (i) GRACE score, the most common risk assessment tool in Portugal for patients with acute coronary syndrome; (ii) logistic regression; (iii) Naïve Bayes; (iv) decision trees; and (v) rule-based approach, previously developed by this team. The obtained results confirm that the simultaneous assessment of trust and performance can be successfully implemented. The rule-based approach seems to have potential for clinical application. It provides a high level of trust in the respective operation while outperformed the GRACE model's performance, enhancing the required physicians' acceptance. This may increase the possibility to effectively aid the clinical decision.
期刊介绍:
Founded in 1963, Medical & Biological Engineering & Computing (MBEC) continues to serve the biomedical engineering community, covering the entire spectrum of biomedical and clinical engineering. The journal presents exciting and vital experimental and theoretical developments in biomedical science and technology, and reports on advances in computer-based methodologies in these multidisciplinary subjects. The journal also incorporates new and evolving technologies including cellular engineering and molecular imaging.
MBEC publishes original research articles as well as reviews and technical notes. Its Rapid Communications category focuses on material of immediate value to the readership, while the Controversies section provides a forum to exchange views on selected issues, stimulating a vigorous and informed debate in this exciting and high profile field.
MBEC is an official journal of the International Federation of Medical and Biological Engineering (IFMBE).