Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection

IF 3.2 2区医学 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS

International journal of cardiology Pub Date : 2025-05-31 DOI:10.1016/j.ijcard.2025.133443

David B. Olawade , Afeez A. Soladoye , Bolaji A. Omodunbi , Nicholas Aderinto , Ibrahim A. Adeyanju

{"title":"Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection","authors":"David B. Olawade , Afeez A. Soladoye , Bolaji A. Omodunbi , Nicholas Aderinto , Ibrahim A. Adeyanju","doi":"10.1016/j.ijcard.2025.133443","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Coronary artery disease (CAD) is a major global cause of death, necessitating early, accurate prediction for better management. Traditional diagnostics are often invasive, costly, and less accessible. Machine learning (ML) offers a non-invasive alternative, but high-dimensional data and redundancy can hinder performance. This study integrates Bald Eagle Search Optimization (BESO) for feature selection to improve CAD classification using multiple ML models.</div></div><div><h3>Methods</h3><div>Two publicly available datasets, Framingham (4200 instances, 15 features) and <em>Z</em>-Alizadeh Sani (304 instances, 55 features), were used. The former predicts 10-year CAD risk, while the latter classifies current CAD status. Data preprocessing included missing value imputation, normalization, categorical encoding, and class balancing using SMOTE. We employed a 70–30 holdout validation strategy with empirical hyperparameter optimization, providing more reliable final model development than cross-validation. BESO was applied to optimize feature selection, significantly outperforming traditional methods like RFE and LASSO. Six ML models—KNN, logistic regression, SVM with linear, polynomial, and RBF kernels, and random forest—were trained and evaluated.</div></div><div><h3>Results</h3><div>Random Forest achieved the highest performance across both datasets. In the Framingham dataset, RF recorded 90 % accuracy, significantly outperforming traditional clinical risk scores (71–73 % accuracy). Linear models performed better on the <em>Z</em>-Alizadeh Sani dataset (90 % accuracy) than Framingham (66 %), indicating dataset characteristics strongly influence model efficacy.</div></div><div><h3>Conclusion</h3><div>BESO significantly enhances feature selection, with RF emerging as the optimal classifier (92 % accuracy) and substantially outperforming established clinical risk scores. This study highlights the potential of AI-driven CAD diagnosis, supporting early detection and improved patient outcomes. Future work should focus on prospective validation and clinical implementation.</div></div>","PeriodicalId":13710,"journal":{"name":"International journal of cardiology","volume":"436 ","pages":"Article 133443"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of cardiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167527325004863","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Coronary artery disease (CAD) is a major global cause of death, necessitating early, accurate prediction for better management. Traditional diagnostics are often invasive, costly, and less accessible. Machine learning (ML) offers a non-invasive alternative, but high-dimensional data and redundancy can hinder performance. This study integrates Bald Eagle Search Optimization (BESO) for feature selection to improve CAD classification using multiple ML models.

Methods

Two publicly available datasets, Framingham (4200 instances, 15 features) and Z-Alizadeh Sani (304 instances, 55 features), were used. The former predicts 10-year CAD risk, while the latter classifies current CAD status. Data preprocessing included missing value imputation, normalization, categorical encoding, and class balancing using SMOTE. We employed a 70–30 holdout validation strategy with empirical hyperparameter optimization, providing more reliable final model development than cross-validation. BESO was applied to optimize feature selection, significantly outperforming traditional methods like RFE and LASSO. Six ML models—KNN, logistic regression, SVM with linear, polynomial, and RBF kernels, and random forest—were trained and evaluated.

Results

Random Forest achieved the highest performance across both datasets. In the Framingham dataset, RF recorded 90 % accuracy, significantly outperforming traditional clinical risk scores (71–73 % accuracy). Linear models performed better on the Z-Alizadeh Sani dataset (90 % accuracy) than Framingham (66 %), indicating dataset characteristics strongly influence model efficacy.

Conclusion

BESO significantly enhances feature selection, with RF emerging as the optimal classifier (92 % accuracy) and substantially outperforming established clinical risk scores. This study highlights the potential of AI-driven CAD diagnosis, supporting early detection and improved patient outcomes. Future work should focus on prospective validation and clinical implementation.

查看原文本刊更多论文

优化特征选择的冠状动脉疾病预测机器学习模型的比较分析

背景：冠状动脉疾病（CAD）是全球主要的死亡原因之一，需要对其进行早期、准确的预测以获得更好的治疗。传统的诊断通常是侵入性的、昂贵的，而且不易获得。机器学习（ML）提供了一种非侵入性的替代方案，但高维数据和冗余可能会阻碍性能。本研究将白头鹰搜索优化（Bald Eagle Search Optimization， BESO）用于特征选择，以改进使用多个ML模型的CAD分类。方法使用Framingham（4200例，15个特征）和Z-Alizadeh Sani（304例，55个特征）两个公开数据集。前者预测10年的加元风险，后者对当前加元状况进行分类。数据预处理包括缺失值输入、规范化、分类编码和使用SMOTE进行类平衡。我们采用了70-30 holdout验证策略和经验超参数优化，提供了比交叉验证更可靠的最终模型开发。应用BESO优化特征选择，显著优于RFE和LASSO等传统方法。6个ML模型- knn、逻辑回归、线性、多项式和RBF核支持向量机和随机森林-进行了训练和评估。结果random Forest在两个数据集上都取得了最高的性能。在Framingham数据集中，RF记录了90%的准确率，显著优于传统的临床风险评分（71 - 73%的准确率）。线性模型在Z-Alizadeh Sani数据集上的准确率（90%）优于Framingham(66%)，表明数据集特征强烈影响模型的有效性。beso显著增强了特征选择，RF成为最佳分类器（准确率为92%），显著优于现有的临床风险评分。这项研究强调了人工智能驱动的CAD诊断的潜力，支持早期发现和改善患者预后。未来的工作应侧重于前瞻性验证和临床实施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of cardiology 医学-心血管系统

CiteScore

6.80

自引率

5.70%

发文量

758

审稿时长

44 days

期刊介绍： The International Journal of Cardiology is devoted to cardiology in the broadest sense. Both basic research and clinical papers can be submitted. The journal serves the interest of both practicing clinicians and researchers. In addition to original papers, we are launching a range of new manuscript types, including Consensus and Position Papers, Systematic Reviews, Meta-analyses, and Short communications. Case reports are no longer acceptable. Controversial techniques, issues on health policy and social medicine are discussed and serve as useful tools for encouraging debate.