Sazzli Kasim , Putri Nur Fatin Amir Rudin , Xue Ning Kiew , Nurulain Ibrahim , Nafiza Mat Nasir , Lim Bing Feng , Hanis Hamidi , Khairul Shafiq Ibrahim , Raja Ezman Raja Shariff , Suraya Abdul-Razak , Kazuaki Negishi , Sorayya Malek
{"title":"Enhancing cardiovascular risk prediction in Asian populations: A machine learning approach integrated with digital health platforms","authors":"Sazzli Kasim , Putri Nur Fatin Amir Rudin , Xue Ning Kiew , Nurulain Ibrahim , Nafiza Mat Nasir , Lim Bing Feng , Hanis Hamidi , Khairul Shafiq Ibrahim , Raja Ezman Raja Shariff , Suraya Abdul-Razak , Kazuaki Negishi , Sorayya Malek","doi":"10.1016/j.ijcrp.2025.200509","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to develop and validate a machine learning (ML)–based model for cardiovascular disease (CVD) risk prediction in a Malaysian cohort representative of the Southeast Asian population.</div></div><div><h3>Methods</h3><div>Data from the Responding to Increasing Cardiovascular Disease Prevalence (REDISCOVER) Study, including 10,044 participants, were analyzed, with 4,299 cases retained after exclusions. The dataset was split into training (70 %) and validation (30 %) subsets. Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) models were developed using feature selection techniques such as recursive feature elimination (RFE) and sequential backward elimination (SBE). Model performance was evaluated using the area under the curve (AUC), sensitivity, specificity, calibration, and Net Reclassification Index (NRI).</div></div><div><h3>Findings</h3><div>Among the models evaluated, the SVM model with SBE-selected features performed best, achieving an AUC of 0.800. This was higher than the Framingham Risk Score (FRS; AUC = 0.693), Revised Pooled Cohort Equations (RPCE; AUC = 0.744), and WHO CVD charts (AUC = 0.741). NRI analysis showed significant improvements compared to FRS and RPCE (17.29 % and 14.23 %, respectively; p < 0.00001). Calibration analyses indicated initial overprediction by ML models, which was mitigated by Platt scaling.</div></div><div><h3>Conclusion</h3><div>ML-based models incorporating regionally relevant variables demonstrated improved discrimination and reclassification compared with conventional risk scores in this Malaysian cohort. Further external validation is needed to establish their utility across broader Southeast Asian populations.</div></div>","PeriodicalId":29726,"journal":{"name":"International Journal of Cardiology Cardiovascular Risk and Prevention","volume":"27 ","pages":"Article 200509"},"PeriodicalIF":2.1000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Cardiology Cardiovascular Risk and Prevention","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772487525001473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PERIPHERAL VASCULAR DISEASE","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
This study aimed to develop and validate a machine learning (ML)–based model for cardiovascular disease (CVD) risk prediction in a Malaysian cohort representative of the Southeast Asian population.
Methods
Data from the Responding to Increasing Cardiovascular Disease Prevalence (REDISCOVER) Study, including 10,044 participants, were analyzed, with 4,299 cases retained after exclusions. The dataset was split into training (70 %) and validation (30 %) subsets. Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) models were developed using feature selection techniques such as recursive feature elimination (RFE) and sequential backward elimination (SBE). Model performance was evaluated using the area under the curve (AUC), sensitivity, specificity, calibration, and Net Reclassification Index (NRI).
Findings
Among the models evaluated, the SVM model with SBE-selected features performed best, achieving an AUC of 0.800. This was higher than the Framingham Risk Score (FRS; AUC = 0.693), Revised Pooled Cohort Equations (RPCE; AUC = 0.744), and WHO CVD charts (AUC = 0.741). NRI analysis showed significant improvements compared to FRS and RPCE (17.29 % and 14.23 %, respectively; p < 0.00001). Calibration analyses indicated initial overprediction by ML models, which was mitigated by Platt scaling.
Conclusion
ML-based models incorporating regionally relevant variables demonstrated improved discrimination and reclassification compared with conventional risk scores in this Malaysian cohort. Further external validation is needed to establish their utility across broader Southeast Asian populations.