Explainable Machine Learning Approach with Augmentation for Mortality Prediction

2023 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET) Pub Date : 2023-04-29 DOI:10.1109/IC_ASET58101.2023.10150509

Firas Ketata, Z. A. Masry, N. Zerhouni, S. Yacoub

{"title":"Explainable Machine Learning Approach with Augmentation for Mortality Prediction","authors":"Firas Ketata, Z. A. Masry, N. Zerhouni, S. Yacoub","doi":"10.1109/IC_ASET58101.2023.10150509","DOIUrl":null,"url":null,"abstract":"Cardiovascular diseases kill approximately 17.7 million people worldwide each year. They mainly occur in the form of myocardial infarction and heart failure. In this context, electronic medical records of patients with their physical characteristics and clinical laboratory test values are available. Biostatistical methods and machine learning (ML) techniques have already been used to find associations between patient characteristics and to predict the mortality in heart failure patients. However, ML models still not applicable in clinics and critical medical conditions. This may be due to the lack of explainability and clarity of ML prediction tools among physicians. Thus, the objective of this study is to propose an explainable approach to support physicians in their decision-making. This approach is based on several ML techniques combined with Shapley values. The goal is to increase the risk coefficients applied by Shapley with the k-fold technique in order to maximize the reliability of the explainability even for small datasets. The proposed approach is validated using the heart failure prediction public dataset. The explainability showed that the ejection fraction and serum creatinine variables are the most important and decisive for the prediction of mortality for patients with heart disease. Finally, the application of the k-fold technique with Shapley values allowed to improve the ranking of feature importance for mortality prediction and to provide meaningful visualization graphs.","PeriodicalId":272261,"journal":{"name":"2023 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC_ASET58101.2023.10150509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cardiovascular diseases kill approximately 17.7 million people worldwide each year. They mainly occur in the form of myocardial infarction and heart failure. In this context, electronic medical records of patients with their physical characteristics and clinical laboratory test values are available. Biostatistical methods and machine learning (ML) techniques have already been used to find associations between patient characteristics and to predict the mortality in heart failure patients. However, ML models still not applicable in clinics and critical medical conditions. This may be due to the lack of explainability and clarity of ML prediction tools among physicians. Thus, the objective of this study is to propose an explainable approach to support physicians in their decision-making. This approach is based on several ML techniques combined with Shapley values. The goal is to increase the risk coefficients applied by Shapley with the k-fold technique in order to maximize the reliability of the explainability even for small datasets. The proposed approach is validated using the heart failure prediction public dataset. The explainability showed that the ejection fraction and serum creatinine variables are the most important and decisive for the prediction of mortality for patients with heart disease. Finally, the application of the k-fold technique with Shapley values allowed to improve the ranking of feature importance for mortality prediction and to provide meaningful visualization graphs.

查看原文本刊更多论文

可解释的机器学习方法与死亡率预测的增强

全世界每年约有1770万人死于心血管疾病。主要表现为心肌梗死和心力衰竭。在这种情况下，可以获得患者的电子医疗记录，包括其身体特征和临床实验室检查值。生物统计学方法和机器学习(ML)技术已被用于发现患者特征之间的关联，并预测心力衰竭患者的死亡率。然而，ML模型仍然不适用于诊所和危重医疗条件。这可能是由于医生的ML预测工具缺乏可解释性和清晰度。因此，本研究的目的是提出一种可解释的方法来支持医生的决策。这种方法是基于几种ML技术与Shapley值的结合。目标是增加Shapley用k-fold技术应用的风险系数，以便即使对于小数据集也能最大限度地提高可解释性的可靠性。使用心力衰竭预测公共数据集验证了所提出的方法。可解释性表明，射血分数和血清肌酐变量是预测心脏病患者死亡率的最重要和决定性因素。最后，应用Shapley值的k-fold技术可以提高死亡率预测的特征重要性排序，并提供有意义的可视化图形。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET)

自引率

0.00%

发文量