Bernada E. Sianga , Maurice C. Mbago , Amina S. Msengwa
{"title":"Predicting the prevalence of cardiovascular diseases using machine learning algorithms","authors":"Bernada E. Sianga , Maurice C. Mbago , Amina S. Msengwa","doi":"10.1016/j.ibmed.2025.100199","DOIUrl":null,"url":null,"abstract":"<div><div>Cardiovascular Diseases (CVDs) are the major cause of morbidity, disability, and mortality worldwide and are the most life-threatening diseases. Early detection and appropriate action can significantly reduce the effects and complications of CVD. Prediction of the likelihood that an individual can develop CVD adverse outcomes is essential. Machine learning methods are used to predict the risk of CVD incidences. Optimal model parameters were obtained using the grid search and randomized search methods. A hyperparameter tuning method with the highest accuracy was used to find the optimal parameters for the six algorithms used in this study. Two experiments were deployed: the first was training and testing the CVD dataset using hyperparameterized ML algorithms excluding geographical features, and the second included geographical features. The geographical features are air humidity, temperature and education status of a location. The performances of the two experiments were compared using classification metrics. The findings revealed that the performance of the second experiment outperformed the first experiment. XGBoost achieved the highest accuracy of 95.24 %, followed by the decision tree 93.87 % and support vector machine 92.87 % when geographical features were included (second experiment). Including geographical risk factors in predicting CVD is crucial as they contribute to the probability of developing CVD incidences.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"11 ","pages":"Article 100199"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266652122500002X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cardiovascular Diseases (CVDs) are the major cause of morbidity, disability, and mortality worldwide and are the most life-threatening diseases. Early detection and appropriate action can significantly reduce the effects and complications of CVD. Prediction of the likelihood that an individual can develop CVD adverse outcomes is essential. Machine learning methods are used to predict the risk of CVD incidences. Optimal model parameters were obtained using the grid search and randomized search methods. A hyperparameter tuning method with the highest accuracy was used to find the optimal parameters for the six algorithms used in this study. Two experiments were deployed: the first was training and testing the CVD dataset using hyperparameterized ML algorithms excluding geographical features, and the second included geographical features. The geographical features are air humidity, temperature and education status of a location. The performances of the two experiments were compared using classification metrics. The findings revealed that the performance of the second experiment outperformed the first experiment. XGBoost achieved the highest accuracy of 95.24 %, followed by the decision tree 93.87 % and support vector machine 92.87 % when geographical features were included (second experiment). Including geographical risk factors in predicting CVD is crucial as they contribute to the probability of developing CVD incidences.