{"title":"心血管疾病预测的集成硬投票模型","authors":"Al-Zadid Sultan Bin Habib, Tanpia Tasnim","doi":"10.1109/STI50764.2020.9350514","DOIUrl":null,"url":null,"abstract":"With the evolution of trending technologies, health informatics has played a vital role in making our day-to-day lives more comfortable. The availability of enough medical data and computational tools has made medical informatics possible to take a long step towards the next level of Healthcare Industry 4.0. Information engineering or emerging technologies can be applied to identify chronic diseases like heart failure to lessen the mortality rate. Machine Learning (ML) based approaches are gaining popularity for predicting these diseases in the 4th generation healthcare industry. In this paper, several risk factors, e.g., age, sex, total cholesterol level, number of cigarettes smoked per day, glucose level, and systolic blood pressure, have been considered input features for causing heart disease next ten years. The Hard Voting (HV) classifier has been formed with Logistic Regression (LogReg), Random Forest (RF), Multilayer Perceptron (MLP), and Gaussian Naïve Bayes (GNB) classifiers. RobustScaler was applied to scale the input attributes’ values, and the dataset was balanced using Random Undersampling. The HV classifier is the satisfactory performance provider with 88.42% test accuracy along with precision, recall, F1, and Area Under Curve (AUC) scores of 1, 0.043, 0.082, and 0.73 correspondingly. The results have also been compared using some other parameters, e.g., the Receiver Operating Characteristics (ROC) curves, learning curves, precision-recall curve, confusion matrix, Logarithmic Loss (Log Loss), Brier Score Loss (BSL), Mathews Correlation Coefficient (MCC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) to bolster the claim.","PeriodicalId":242439,"journal":{"name":"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"An Ensemble Hard Voting Model for Cardiovascular Disease Prediction\",\"authors\":\"Al-Zadid Sultan Bin Habib, Tanpia Tasnim\",\"doi\":\"10.1109/STI50764.2020.9350514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the evolution of trending technologies, health informatics has played a vital role in making our day-to-day lives more comfortable. The availability of enough medical data and computational tools has made medical informatics possible to take a long step towards the next level of Healthcare Industry 4.0. Information engineering or emerging technologies can be applied to identify chronic diseases like heart failure to lessen the mortality rate. Machine Learning (ML) based approaches are gaining popularity for predicting these diseases in the 4th generation healthcare industry. In this paper, several risk factors, e.g., age, sex, total cholesterol level, number of cigarettes smoked per day, glucose level, and systolic blood pressure, have been considered input features for causing heart disease next ten years. The Hard Voting (HV) classifier has been formed with Logistic Regression (LogReg), Random Forest (RF), Multilayer Perceptron (MLP), and Gaussian Naïve Bayes (GNB) classifiers. RobustScaler was applied to scale the input attributes’ values, and the dataset was balanced using Random Undersampling. The HV classifier is the satisfactory performance provider with 88.42% test accuracy along with precision, recall, F1, and Area Under Curve (AUC) scores of 1, 0.043, 0.082, and 0.73 correspondingly. The results have also been compared using some other parameters, e.g., the Receiver Operating Characteristics (ROC) curves, learning curves, precision-recall curve, confusion matrix, Logarithmic Loss (Log Loss), Brier Score Loss (BSL), Mathews Correlation Coefficient (MCC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) to bolster the claim.\",\"PeriodicalId\":242439,\"journal\":{\"name\":\"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STI50764.2020.9350514\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STI50764.2020.9350514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Ensemble Hard Voting Model for Cardiovascular Disease Prediction
With the evolution of trending technologies, health informatics has played a vital role in making our day-to-day lives more comfortable. The availability of enough medical data and computational tools has made medical informatics possible to take a long step towards the next level of Healthcare Industry 4.0. Information engineering or emerging technologies can be applied to identify chronic diseases like heart failure to lessen the mortality rate. Machine Learning (ML) based approaches are gaining popularity for predicting these diseases in the 4th generation healthcare industry. In this paper, several risk factors, e.g., age, sex, total cholesterol level, number of cigarettes smoked per day, glucose level, and systolic blood pressure, have been considered input features for causing heart disease next ten years. The Hard Voting (HV) classifier has been formed with Logistic Regression (LogReg), Random Forest (RF), Multilayer Perceptron (MLP), and Gaussian Naïve Bayes (GNB) classifiers. RobustScaler was applied to scale the input attributes’ values, and the dataset was balanced using Random Undersampling. The HV classifier is the satisfactory performance provider with 88.42% test accuracy along with precision, recall, F1, and Area Under Curve (AUC) scores of 1, 0.043, 0.082, and 0.73 correspondingly. The results have also been compared using some other parameters, e.g., the Receiver Operating Characteristics (ROC) curves, learning curves, precision-recall curve, confusion matrix, Logarithmic Loss (Log Loss), Brier Score Loss (BSL), Mathews Correlation Coefficient (MCC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) to bolster the claim.