{"title":"Explainable Machine Learning Approach with Augmentation for Mortality Prediction","authors":"Firas Ketata, Z. A. Masry, N. Zerhouni, S. Yacoub","doi":"10.1109/IC_ASET58101.2023.10150509","DOIUrl":null,"url":null,"abstract":"Cardiovascular diseases kill approximately 17.7 million people worldwide each year. They mainly occur in the form of myocardial infarction and heart failure. In this context, electronic medical records of patients with their physical characteristics and clinical laboratory test values are available. Biostatistical methods and machine learning (ML) techniques have already been used to find associations between patient characteristics and to predict the mortality in heart failure patients. However, ML models still not applicable in clinics and critical medical conditions. This may be due to the lack of explainability and clarity of ML prediction tools among physicians. Thus, the objective of this study is to propose an explainable approach to support physicians in their decision-making. This approach is based on several ML techniques combined with Shapley values. The goal is to increase the risk coefficients applied by Shapley with the k-fold technique in order to maximize the reliability of the explainability even for small datasets. The proposed approach is validated using the heart failure prediction public dataset. The explainability showed that the ejection fraction and serum creatinine variables are the most important and decisive for the prediction of mortality for patients with heart disease. Finally, the application of the k-fold technique with Shapley values allowed to improve the ranking of feature importance for mortality prediction and to provide meaningful visualization graphs.","PeriodicalId":272261,"journal":{"name":"2023 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC_ASET58101.2023.10150509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cardiovascular diseases kill approximately 17.7 million people worldwide each year. They mainly occur in the form of myocardial infarction and heart failure. In this context, electronic medical records of patients with their physical characteristics and clinical laboratory test values are available. Biostatistical methods and machine learning (ML) techniques have already been used to find associations between patient characteristics and to predict the mortality in heart failure patients. However, ML models still not applicable in clinics and critical medical conditions. This may be due to the lack of explainability and clarity of ML prediction tools among physicians. Thus, the objective of this study is to propose an explainable approach to support physicians in their decision-making. This approach is based on several ML techniques combined with Shapley values. The goal is to increase the risk coefficients applied by Shapley with the k-fold technique in order to maximize the reliability of the explainability even for small datasets. The proposed approach is validated using the heart failure prediction public dataset. The explainability showed that the ejection fraction and serum creatinine variables are the most important and decisive for the prediction of mortality for patients with heart disease. Finally, the application of the k-fold technique with Shapley values allowed to improve the ranking of feature importance for mortality prediction and to provide meaningful visualization graphs.