David B. Olawade , Afeez A. Soladoye , Bolaji A. Omodunbi , Nicholas Aderinto , Ibrahim A. Adeyanju
{"title":"Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection","authors":"David B. Olawade , Afeez A. Soladoye , Bolaji A. Omodunbi , Nicholas Aderinto , Ibrahim A. Adeyanju","doi":"10.1016/j.ijcard.2025.133443","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Coronary artery disease (CAD) is a major global cause of death, necessitating early, accurate prediction for better management. Traditional diagnostics are often invasive, costly, and less accessible. Machine learning (ML) offers a non-invasive alternative, but high-dimensional data and redundancy can hinder performance. This study integrates Bald Eagle Search Optimization (BESO) for feature selection to improve CAD classification using multiple ML models.</div></div><div><h3>Methods</h3><div>Two publicly available datasets, Framingham (4200 instances, 15 features) and <em>Z</em>-Alizadeh Sani (304 instances, 55 features), were used. The former predicts 10-year CAD risk, while the latter classifies current CAD status. Data preprocessing included missing value imputation, normalization, categorical encoding, and class balancing using SMOTE. We employed a 70–30 holdout validation strategy with empirical hyperparameter optimization, providing more reliable final model development than cross-validation. BESO was applied to optimize feature selection, significantly outperforming traditional methods like RFE and LASSO. Six ML models—KNN, logistic regression, SVM with linear, polynomial, and RBF kernels, and random forest—were trained and evaluated.</div></div><div><h3>Results</h3><div>Random Forest achieved the highest performance across both datasets. In the Framingham dataset, RF recorded 90 % accuracy, significantly outperforming traditional clinical risk scores (71–73 % accuracy). Linear models performed better on the <em>Z</em>-Alizadeh Sani dataset (90 % accuracy) than Framingham (66 %), indicating dataset characteristics strongly influence model efficacy.</div></div><div><h3>Conclusion</h3><div>BESO significantly enhances feature selection, with RF emerging as the optimal classifier (92 % accuracy) and substantially outperforming established clinical risk scores. This study highlights the potential of AI-driven CAD diagnosis, supporting early detection and improved patient outcomes. Future work should focus on prospective validation and clinical implementation.</div></div>","PeriodicalId":13710,"journal":{"name":"International journal of cardiology","volume":"436 ","pages":"Article 133443"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of cardiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167527325004863","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Coronary artery disease (CAD) is a major global cause of death, necessitating early, accurate prediction for better management. Traditional diagnostics are often invasive, costly, and less accessible. Machine learning (ML) offers a non-invasive alternative, but high-dimensional data and redundancy can hinder performance. This study integrates Bald Eagle Search Optimization (BESO) for feature selection to improve CAD classification using multiple ML models.
Methods
Two publicly available datasets, Framingham (4200 instances, 15 features) and Z-Alizadeh Sani (304 instances, 55 features), were used. The former predicts 10-year CAD risk, while the latter classifies current CAD status. Data preprocessing included missing value imputation, normalization, categorical encoding, and class balancing using SMOTE. We employed a 70–30 holdout validation strategy with empirical hyperparameter optimization, providing more reliable final model development than cross-validation. BESO was applied to optimize feature selection, significantly outperforming traditional methods like RFE and LASSO. Six ML models—KNN, logistic regression, SVM with linear, polynomial, and RBF kernels, and random forest—were trained and evaluated.
Results
Random Forest achieved the highest performance across both datasets. In the Framingham dataset, RF recorded 90 % accuracy, significantly outperforming traditional clinical risk scores (71–73 % accuracy). Linear models performed better on the Z-Alizadeh Sani dataset (90 % accuracy) than Framingham (66 %), indicating dataset characteristics strongly influence model efficacy.
Conclusion
BESO significantly enhances feature selection, with RF emerging as the optimal classifier (92 % accuracy) and substantially outperforming established clinical risk scores. This study highlights the potential of AI-driven CAD diagnosis, supporting early detection and improved patient outcomes. Future work should focus on prospective validation and clinical implementation.
期刊介绍:
The International Journal of Cardiology is devoted to cardiology in the broadest sense. Both basic research and clinical papers can be submitted. The journal serves the interest of both practicing clinicians and researchers.
In addition to original papers, we are launching a range of new manuscript types, including Consensus and Position Papers, Systematic Reviews, Meta-analyses, and Short communications. Case reports are no longer acceptable. Controversial techniques, issues on health policy and social medicine are discussed and serve as useful tools for encouraging debate.