Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff
{"title":"西太平洋心血管风险预测增强:针对马来西亚人口的机器学习方法。","authors":"Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff","doi":"10.1371/journal.pone.0323949","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.</p><p><strong>Objective: </strong>This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.</p><p><strong>Methods: </strong>Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).</p><p><strong>Results: </strong>The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.</p><p><strong>Conclusions: </strong>These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0323949"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12173414/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.\",\"authors\":\"Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff\",\"doi\":\"10.1371/journal.pone.0323949\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.</p><p><strong>Objective: </strong>This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.</p><p><strong>Methods: </strong>Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).</p><p><strong>Results: </strong>The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.</p><p><strong>Conclusions: </strong>These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0323949\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12173414/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0323949\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0323949","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.
Background: Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.
Objective: This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.
Methods: Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).
Results: The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.
Conclusions: These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage