Subhash Mondal, Ranjan Maity, Yash Raj Singh, Soumadip Ghosh, A. Nag
{"title":"Early Prediction of Coronary Heart Disease using Boosting-based Voting Ensemble Learning","authors":"Subhash Mondal, Ranjan Maity, Yash Raj Singh, Soumadip Ghosh, A. Nag","doi":"10.1109/IBSSC56953.2022.10037445","DOIUrl":null,"url":null,"abstract":"Coronary-Heart-Disease (CHD) risk increases daily due to the uncontrolled lifestyle of today's adult age group. The early detection of the disease can prevent unfortunate death due to heart-related complications. The Machine Learning (ML) technique is essential for the early diagnosis of CHD and for identifying its many contributing factor variables. To build the prediction model, we have used the dataset consisting of 4240 instances and 15 related features to predict the possibility of future risk of CHD in the next ten years. Initially, thirteen ML models were deployed with 10-fold cross-validation, reflecting the highest test accuracy of 91.28% for the Random Forest (RF) classifier. The models were turned further, and the boosting algorithms showed the highest accuracy of 91 % and above; the Gradient Boost (GB) classifier performed better with an accuracy of 92.11 %. The voting ensemble approaches using the best-performing boosting models, namely GB, HGB, XGB, CB, and LGBM, have been considered for the final prediction. The prediction results reflected an accuracy of 92.26%, an F1 score of 91.25%, a ROC-AUC score of 0.917, and the number of False Negatives (FN) values is about 6.25% of the total test dataset.","PeriodicalId":426897,"journal":{"name":"2022 IEEE Bombay Section Signature Conference (IBSSC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Bombay Section Signature Conference (IBSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBSSC56953.2022.10037445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Coronary-Heart-Disease (CHD) risk increases daily due to the uncontrolled lifestyle of today's adult age group. The early detection of the disease can prevent unfortunate death due to heart-related complications. The Machine Learning (ML) technique is essential for the early diagnosis of CHD and for identifying its many contributing factor variables. To build the prediction model, we have used the dataset consisting of 4240 instances and 15 related features to predict the possibility of future risk of CHD in the next ten years. Initially, thirteen ML models were deployed with 10-fold cross-validation, reflecting the highest test accuracy of 91.28% for the Random Forest (RF) classifier. The models were turned further, and the boosting algorithms showed the highest accuracy of 91 % and above; the Gradient Boost (GB) classifier performed better with an accuracy of 92.11 %. The voting ensemble approaches using the best-performing boosting models, namely GB, HGB, XGB, CB, and LGBM, have been considered for the final prediction. The prediction results reflected an accuracy of 92.26%, an F1 score of 91.25%, a ROC-AUC score of 0.917, and the number of False Negatives (FN) values is about 6.25% of the total test dataset.