使用机器学习算法预测心血管疾病的患病率

Intelligence-based medicine Pub Date : 2025-01-01 DOI:10.1016/j.ibmed.2025.100199

Bernada E. Sianga , Maurice C. Mbago , Amina S. Msengwa

{"title":"使用机器学习算法预测心血管疾病的患病率","authors":"Bernada E. Sianga , Maurice C. Mbago , Amina S. Msengwa","doi":"10.1016/j.ibmed.2025.100199","DOIUrl":null,"url":null,"abstract":"<div><div>Cardiovascular Diseases (CVDs) are the major cause of morbidity, disability, and mortality worldwide and are the most life-threatening diseases. Early detection and appropriate action can significantly reduce the effects and complications of CVD. Prediction of the likelihood that an individual can develop CVD adverse outcomes is essential. Machine learning methods are used to predict the risk of CVD incidences. Optimal model parameters were obtained using the grid search and randomized search methods. A hyperparameter tuning method with the highest accuracy was used to find the optimal parameters for the six algorithms used in this study. Two experiments were deployed: the first was training and testing the CVD dataset using hyperparameterized ML algorithms excluding geographical features, and the second included geographical features. The geographical features are air humidity, temperature and education status of a location. The performances of the two experiments were compared using classification metrics. The findings revealed that the performance of the second experiment outperformed the first experiment. XGBoost achieved the highest accuracy of 95.24 %, followed by the decision tree 93.87 % and support vector machine 92.87 % when geographical features were included (second experiment). Including geographical risk factors in predicting CVD is crucial as they contribute to the probability of developing CVD incidences.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"11 ","pages":"Article 100199"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting the prevalence of cardiovascular diseases using machine learning algorithms\",\"authors\":\"Bernada E. Sianga , Maurice C. Mbago , Amina S. Msengwa\",\"doi\":\"10.1016/j.ibmed.2025.100199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cardiovascular Diseases (CVDs) are the major cause of morbidity, disability, and mortality worldwide and are the most life-threatening diseases. Early detection and appropriate action can significantly reduce the effects and complications of CVD. Prediction of the likelihood that an individual can develop CVD adverse outcomes is essential. Machine learning methods are used to predict the risk of CVD incidences. Optimal model parameters were obtained using the grid search and randomized search methods. A hyperparameter tuning method with the highest accuracy was used to find the optimal parameters for the six algorithms used in this study. Two experiments were deployed: the first was training and testing the CVD dataset using hyperparameterized ML algorithms excluding geographical features, and the second included geographical features. The geographical features are air humidity, temperature and education status of a location. The performances of the two experiments were compared using classification metrics. The findings revealed that the performance of the second experiment outperformed the first experiment. XGBoost achieved the highest accuracy of 95.24 %, followed by the decision tree 93.87 % and support vector machine 92.87 % when geographical features were included (second experiment). Including geographical risk factors in predicting CVD is crucial as they contribute to the probability of developing CVD incidences.</div></div>\",\"PeriodicalId\":73399,\"journal\":{\"name\":\"Intelligence-based medicine\",\"volume\":\"11 \",\"pages\":\"Article 100199\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligence-based medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S266652122500002X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266652122500002X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

心血管疾病（cvd）是世界范围内发病、残疾和死亡的主要原因，也是最危及生命的疾病。早期发现和适当的行动可以显著减少心血管疾病的影响和并发症。预测个体发生心血管疾病不良后果的可能性至关重要。机器学习方法用于预测心血管疾病发病率的风险。采用网格搜索和随机搜索方法获得最优模型参数。采用精度最高的超参数整定方法对六种算法进行了参数优化。部署了两个实验：第一个是使用排除地理特征的超参数化ML算法训练和测试CVD数据集，第二个是包含地理特征。地理特征是指一个地点的空气湿度、温度和教育状况。使用分类指标对两个实验的性能进行比较。结果显示，第二个实验的表现优于第一个实验。在包含地理特征时，XGBoost的准确率最高，为95.24%，其次是决策树（93.87%）和支持向量机（92.87%）（第二次实验）。在预测心血管疾病时包括地理危险因素是至关重要的，因为它们有助于心血管疾病发生的概率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting the prevalence of cardiovascular diseases using machine learning algorithms

Cardiovascular Diseases (CVDs) are the major cause of morbidity, disability, and mortality worldwide and are the most life-threatening diseases. Early detection and appropriate action can significantly reduce the effects and complications of CVD. Prediction of the likelihood that an individual can develop CVD adverse outcomes is essential. Machine learning methods are used to predict the risk of CVD incidences. Optimal model parameters were obtained using the grid search and randomized search methods. A hyperparameter tuning method with the highest accuracy was used to find the optimal parameters for the six algorithms used in this study. Two experiments were deployed: the first was training and testing the CVD dataset using hyperparameterized ML algorithms excluding geographical features, and the second included geographical features. The geographical features are air humidity, temperature and education status of a location. The performances of the two experiments were compared using classification metrics. The findings revealed that the performance of the second experiment outperformed the first experiment. XGBoost achieved the highest accuracy of 95.24 %, followed by the decision tree 93.87 % and support vector machine 92.87 % when geographical features were included (second experiment). Including geographical risk factors in predicting CVD is crucial as they contribute to the probability of developing CVD incidences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligence-based medicine Health Informatics

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

187 days