Huilin Zheng, Malik Muhammad Waqar, Saba Arif, Syed Waseem Abbas Sherazi, Sang Hyeok Son, Jong Yun Lee
{"title":"An Explainable Machine Learning-based Prediction Model for In-hospital Mortality in Acute Myocardial Infarction Patients with Typical Chest Pain","authors":"Huilin Zheng, Malik Muhammad Waqar, Saba Arif, Syed Waseem Abbas Sherazi, Sang Hyeok Son, Jong Yun Lee","doi":"10.1145/3584871.3584877","DOIUrl":null,"url":null,"abstract":"Acute myocardial infarction (AMI) is the leading cause of hospital admissions and death all over the world and chest pain is the most common presenting complaint of AMI. Therefore, this paper proposes a machine learning (ML)-based prediction model for the in-hospital mortality in AMI patients with typical chest pain. To understand the principle of the black-box prediction model, a Shapley additive explanations (SHAP) method is applied to the ML-based prediction model. The experimental framework mainly includes three steps. First, we extract the experimental data from the Korea Acute Myocardial Infarction Registry National Institutes of Health (KAMIR-NIH), and then preprocess the selected data with missing value imputation, data normalization, and splitting. Thereafter, two kinds of data sampling methods such as synthetic minority oversampling techniques (SMOTE) and Adaptive Synthetic (ADASYN), are applied to handle the class imbalance problem on the experimental data. Second, different ML models such as decision tree, random forest, extreme gradient boosting (XGBoost), support vector machine, and logistic regression, are trained and evaluated on the preprocessed AMI patient data. Finally, the SHAP method is used to explain the best ML-based prediction model. The experimental results showed that the logistic regression with the ADASYN approach achieved the highest performance. Moreover, the SHAP technique enhanced the transparency of the ML model and can be a good reference for doctors to support their decisions in real life.","PeriodicalId":173315,"journal":{"name":"Proceedings of the 2023 6th International Conference on Software Engineering and Information Management","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 6th International Conference on Software Engineering and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3584871.3584877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Acute myocardial infarction (AMI) is the leading cause of hospital admissions and death all over the world and chest pain is the most common presenting complaint of AMI. Therefore, this paper proposes a machine learning (ML)-based prediction model for the in-hospital mortality in AMI patients with typical chest pain. To understand the principle of the black-box prediction model, a Shapley additive explanations (SHAP) method is applied to the ML-based prediction model. The experimental framework mainly includes three steps. First, we extract the experimental data from the Korea Acute Myocardial Infarction Registry National Institutes of Health (KAMIR-NIH), and then preprocess the selected data with missing value imputation, data normalization, and splitting. Thereafter, two kinds of data sampling methods such as synthetic minority oversampling techniques (SMOTE) and Adaptive Synthetic (ADASYN), are applied to handle the class imbalance problem on the experimental data. Second, different ML models such as decision tree, random forest, extreme gradient boosting (XGBoost), support vector machine, and logistic regression, are trained and evaluated on the preprocessed AMI patient data. Finally, the SHAP method is used to explain the best ML-based prediction model. The experimental results showed that the logistic regression with the ADASYN approach achieved the highest performance. Moreover, the SHAP technique enhanced the transparency of the ML model and can be a good reference for doctors to support their decisions in real life.